Separating Timing, Movement Conditions and Individual Differences in the Analysis of Human Movement

A central task in the analysis of human movement behavior is to determine systematic patterns and differences across experimental conditions, participants and repetitions. This is possible because human movement is highly regular, being constrained by invariance principles. Movement timing and movement path, in particular, are linked through scaling laws. Separating variations of movement timing from the spatial variations of movements is a well-known challenge that is addressed in current approaches only through forms of preprocessing that bias analysis. Here we propose a novel nonlinear mixed-effects model for analyzing temporally continuous signals that contain systematic effects in both timing and path. Identifiability issues of path relative to timing are overcome by using maximum likelihood estimation in which the most likely separation of space and time is chosen given the variation found in data. The model is applied to analyze experimental data of human arm movements in which participants move a hand-held object to a target location while avoiding an obstacle. The model is used to classify movement data according to participant. Comparison to alternative approaches establishes nonlinear mixed-effects models as viable alternatives to conventional analysis frameworks. The model is then combined with a novel factor-analysis model that estimates the low-dimensional subspace within which movements vary when the task demands vary. Our framework enables us to visualize different dimensions of movement variation and to test hypotheses about the effect of obstacle placement and height on the movement path. We demonstrate that the approach can be used to uncover new properties of human movement.


Introduction
When humans move and manipulate objects, their hand paths are smooth, but also highly flexible. Humans do not move in a jerky, robot-like way that is sometimes humorously invoked to illustrated "unnatural" movement behavior. In fact, humans have a hard time making "arbitrary" movements. Even when they scribble freely in three dimensions, their hand moves in a regular way that is typically piecewise planar [1,2]. Movement generation by the nervous systems, the neuro-muscular systems, and the body is constrained by implied laws of motion signatures which are found empirically through invariances of movement trajectories and movement paths. Among these, laws decoupling space and time are of particular importance. For instance, the fact that the trajectories of the hand have approximately bell-shaped velocity profiles across varying movement amplitudes [3] implies a scaling of the time dependence of velocity. The 2/3 power law [4] establishes an analogous scaling of time with the spatial path of the hand's movement. Similarly, the isochrony principle [5] captures that the same spatial segment of a movement takes up the same proportion of movement time as movement amplitude is rescaled. Several of these invariances can be linked to geometrical invariance principles [6].
These invariances imply that movements as a whole have a reproducible temporal form, which can be characterized by movement parameters. Their values are specified before a movement begins, so that one may predict the movement's time course and path based on just an initial portion of the trajectory [7]. Movement parameters are assumed to reside at the level of end-effector trajectories in space and their neural encoding begins to be known [8][9][10]. The set of possible movements can thus be spanned by a limited number of such parameters. Moreover, the choices of these movement parameters are constrained. For instance, in sequences of movements, earlier segments predict later segments [11].
A key source of variance of kinematic variables is, of course, the time course of the movement itself. The invariance principles suggest that this source of variance can be disentangled from the variation induced when the movement task varies. In this paper, we will first address time as a source of variance, focussing on a fixed movement task, and then use the methods developed to address how movements vary when the task is varied. Given a fixed movement task, movement trajectories also vary across individuals. Individual differences in movement, a personal movement style, are reproducible and stable over time, as witnessed, for instance, by the possibility to identify individuals or individual characteristics such as gender by movement information alone [12][13][14].
A third source of variation are fluctuations in how movements are performed from trial to trial or across movement cycles in rhythmic movements. Such fluctuations are of particular interest to movement scientists, because they reflect not only sources of random variability such as neural or muscular noise, but also the extent to which the mechanisms of movement generation stabilize movement against such noise. Instabilities in patterns of coordination have been detected by an increase of fluctuations [15] and differences in variance among different degrees of freedom have been used to establish priorities of neural control [16,17].
A systematic method to disentangle these three sources of movement variation, time, individual differences, and fluctuations, would be a very helpful research tool. Such a method would decompose sets of observed kinematic time series into a common trajectory (that may be specific to the task), participant-specific movement traits, and random effects. Given the observed decoupling of space and time, such a decomposition would also separate the rescaling of time across these three factors from the variation of the spatial characteristics of movement.
The statistical subfield that deals with analysis of temporal trajectories is the field of functional data analysis. In the literature on functional data, the typical approach for handling continuous signals with time-warping effects is to pre-align samples under an oversimplified noise model in the hope of eliminating the effects of movement timing [18]. In contrast, we propose an analytic framework in which the decomposition of the signal is done simultaneously with the estimation of movement timing effects, so that samples are continually aligned under an estimated noise model. Furthermore, we account for both the task-dependent variation of movement and for individual differences (a brief review of warping in the modeling of biological motion is provided in the Methods section).
Decomposition of time series into a common effect (the time course of the movement given a fixed task), an individual effect, and random variation naturally leads to a mixed-effects formulation [19]. The addition of nonlinear timing effects gives the model the structure of a hierarchical nonlinear mixed-effects model [20]. We present a framework for maximumlikelihood estimation in the model and demonstrate that the method leads to high-quality templates that foster subsequent analysis (e.g., classification). We then show that the results of this analysis can be combined with other models to test hypotheses about the invariance of movement patterns across participants and task conditions. We demonstrate this by using the individual warping functions in a novel factor analysis model that captures variation of movement trajectories with task conditions. We use as of yet unpublished data from a study of naturalistic movement that extends published work [21]. In the study, human participants transport a wooden cylinder from a starting to a target location while avoiding obstacles at different spatial positions along the path. Earlier work has shown that movement paths and trajectories in this relatively unconstrained, naturalistic movement task clearly reflect typical invariances of movement generation, including the planar nature of movement paths, spatiotemporal invariance of velocity profiles, and a local isochrony principle that reflects the decoupling of space and time [21]. By varying the obstacle configuration, the data include significant and non-trivial task-level variation. We begin by modeling a one-dimensional projection of the time courses of acceleration of the hand in space, which we decompose into a common pattern and the deviations from it that characterize each participant. The timing of the acceleration profiles is determined by individual time warping functions which are of higher quality than conventional estimates, since timing and movement noise are modeled simultaneously. The high quality of the estimates is demonstrated by classifying movements according to participant. Finally, the results of the nonlinear mixed model are analyzed using a novel factor analysis model that estimates a low-dimensional subspace within which movement paths change when the task conditions are varied. This combination of statistical models makes it possible to separate and visualize the variation caused by experimental conditions, participants and repetition. Furthermore, we can formulate and test hypotheses about the effects of experimental conditions on movement paths. Using the proposed statistical models, we document three findings: a linearly amplified deviation in mean path related to increase in obstacle height; a consistent asymmetric pattern of variation along the movement path related to obstacle placement; and the existence of obstacle-distance invariant focal points where mean trajectories intersect in the frontal and vertical plane.
Software for performing the described types of simultaneous analyses of timing and movement effects are publicly available through the pavpop R package [22]. A short guide on model building and fitting in the proposed framework is available as Supporting Information, along with an application to handwritten signature data.

Experimental data set
Ten participants performed a series of simple, naturalistic motor acts in which they moved a wooden cylinder from a starting to a target position while avoiding a cylindric obstacle. The obstacle's height and positition along the movement path were varied across experiments (Fig 1).
The movements were recorded with the Visualeyez (Phoenix Technologies Inc.) motion capture system VZ 4000. Two trackers, each equipped with three cameras, were mounted on the wall 1.5 m above the working surface, so that both systems had an excellent view of the table. A wireless infrared light-emitting diode (IRED) was attached to the wooden cylinder. The trajectories of markers were recorded in three Cartesian dimensions at a sampling rate of 110 Hz based on a reference frame anchored on the table. The starting position projected to the table was taken as the origin of each trajectory in three-dimensional Cartesian space. Recorded movement paths for two experimental conditions are shown in Fig 2. The acceleration profiles considered in the following sections were obtained by using finite difference approximations of the raw velocity magnitude data, see Fig 3. Obstacle avoidance was performed in 15 different conditions that combined three obstacle heights S, M, or T with five distances of the obstacle from the starting position d 2 {15, 22.5, 30, 37.5, 45}. A control condition had no obstacle. The participants performed each condition 10 times. Each experimental condition provided n = 100 functional samples for a total of n f = 1600 functional samples in the dataset, leading to a total data size of m = 175,535 observed time points.
The present data set is described in detail in [23]. The experiment is a refined version of the experiment described in [21].

Time warping of functional data
Not every movement has the exact same duration. Comparisons across movement conditions, participants, and repetitions are hampered by the resulting lack of alignment of the movement trajectories. For a single condition, this is illustrated in Fig 3(a) and 3(c), in which the duration of the movement clearly varies from participant to participant but also from repetition to repetition. Without alignment, it is difficult to compare movements. In an experiment such as the present, in which start and target positions are fixed, the standard solution for aligning samples is to use percentual time; the onset of the individual movement corresponds to time 0% and the end of the movement to time 100%. Such linear warping is based on detecting movement onset and offset through threshold criteria. As can be seen in Fig 3(b) and 3(c), linear warping does not align the characteristic features of the acceleration signals very well, however, as there is still considerable variation in the times at which acceleration peaks. There is, in other words, a nonlinear component to the variation of timing.
To handle nonlinear variation of timing, the signal must be time warped based on an estimated, continuous, and monotonically increasing function that maps percentual time to warped percentual time, such that the functional profiles of the signals are best aligned with each other. Such warping has traditionally been achieved by using the dynamic-time warping (DTW) algorithm [24] which offers a fast approach for globally optimal alignment under a prespecified distance measure (for reviews of time warping in the domains of biological movement modeling see [25,26]). DTW is both simple and elegant, but while it will often produce much better results than cross-sectional comparison of time-warped curves [27][28][29], it does suffer from some problems. In particular, DTW requires a pointwise distance measure such as Euclidean distance. Therefore, the algorithm cannot take serially correlated noise effects in a signal into account. As a result, basic unconstrained DTW will overfit in the sense of producing perfect fits whenever possible, and for areas that cannot be perfectly matched, either stretch them or compress them to a single point. In other words, DTW cannot model curves with systematic amplitude differences, and using DTW to naively compute time warped mean curves is   in general problematic [30,31]. This lacking ability to model serial correlated effects can be somewhat mitigated by restricting the DTW step pattern, in particular through a reduced search window for the warping function and constraints on the maximal step sizes. These are, however, hard model choices, they are restrictions on the set of possible warping functions, and they are a difficult to interpret since they seek to fix a problem in amplitude variation by penalizing warping variation. Instead, a much more natural approach would be to use a data term that models the amplitude variation encountered in data, and to impose warp regularization by using a cost function that puts high cost on undesired warping functions. In the following sections, we will propose a model with these properties, which, in addition, allows for estimating the data term, warp regularization and their relative weights from the data.
To illustrate the difference between DTW and the proposed method, consider the example displayed in Fig 4 where the recorded z-coordinates (elevation) of one participant's 10 movements in the control condition (without obstacle) are plotted in recorded time (a) and percentual time (b). These samples have been aligned using DTW by iteratively estimating a pointwise mean function and aligning the samples to the mean function (10 iterations). The three rows of Fig 5 display the results of the procedure using three different step patterns. We first note the strong overfitting of the symmetric and asymmetric step patterns, where the sequences with highest elevation are collapsed to minimize the residual. Secondly, we note the jagged warping functions that are results of the discrete nature of the DTW procedure. For comparison, we fitted a variant of the proposed model with a continuous model for the warping function controlled by 13 basis functions. We modeled both amplitude and warping effects as random Gaussian processes using simple, but versatile classes of covariance functions (see Supporting Information), and estimated the internal weighting of the effect directly from the samples. The results are displayed in Fig 6. We see that the warped elevation trajectories seem perfectly aligned, and that the corresponding warping functions are relatively simple, with the majority of variation being near the end of the movement. It is evident from the figures that both the alignment is much more reasonable, and the warping functions are much simpler than the warping functions found using DTW.   Table I] with a slope constraint of 2.
Two questions naturally arise. Firstly, are the warping functions unique? Other warping functions could perhaps have produced similarly well-aligned data. Secondly, do we want perfectly aligned trajectories? There is still considerable variation of the amplitude in the warped z-coordinates of the movement trajectories in Fig 4(c), for instance. Some of the variation visible in the unwarped variant in Fig 4(b) could be due to random variations in amplitude rather than timing.
Using the time-warping functions that were determined by aligning the z-coordinates of the movements to now warp the trajectories for the x-and y-coordinates (Fig 7), we see that the  alignment obtained is not perfect. Conversely, were we to use warping functions determined for the x-or y-coordinate to warp the z-coordinate we would encounter similarly imperfect alignment. Thus, we need a method that avoids over-aligning, and represents a trade-off between the complexity of the warping functions and of the amplitude variation in data. In the next section, we introduce a statistical model that handle this trade-off in a data-driven fashion by using the patterns of variation in the data to find the most likely separation of amplitude and timing variation.
Statistical modeling of movement data to achieve time warping. In the following, we describe inference for a single experimental condition. For a given experimental condition-an object that needs to be moved to a target and an obstacle that needs to be avoided-we assume there is a common underlying pattern in all acceleration profiles; all n p = 10 participants will lift the object and move it toward the target, lifting it over the obstacle at some point. This assumption is supported by the pattern in the data that Fig 3 visualizes. We denote the hypothesized underlying acceleration profile shared across participants and repetitions by θ. In addition to this fixed acceleration profile, we assume that each participant, i, has a typical deviation φ i from θ, so that the acceleration profile that is characteristic of that participant is θ + φ i . Such a systematic pattern characteristic of each participant is apparent in Fig 3(c) and 3(d). The individual trials (repetitions) of the movement deviate from this characteristic profile of the individual. We model these deviations as additive random effects with serial correlations so that for each repetition, j, of the experimental condition we have an additive random effect x ij that causes deviation from the ideal profile. Finally we assume that the data contains observation noise ε ij tied to the tracking system and data processing.
Time was implicit, up to this point, and the observed acceleration profile was decomposed into additive, linear contributions. We now assume, in addition, that each participant, i, has a consistent timing of the movement across repetitions, that is reflected in the temporal deformation of the acceleration profile (Fig 3(a) and 3(c)) and is captured by the time warping function ν i . On each repetition, j, of the condition, the timing of participant, i, contains a random variation of timing around ν i captured by a random warping function v ij (see Fig 3).
Altogether, we have described the following statistical model of the observed acceleration profiles across participants: where denotes functional composition, t denotes time, y; φ i ; n i : R ! R are fixed effects and v ij , x ij and ε ij are random effects. The serially correlated effect x ij is assumed to be a zeromean Gaussian process with a parametric covariance function S : R Â R ! R; the randomness of the warping function v ij is assumed to be completely characterized by a latent vector of n w zero-mean Gaussian random variables w ij with covariance matrix σ 2 C; and ε ij is Gaussian white noise with variance σ 2 . Compared to conventional methods for achieving time warping, the proposed model (1) models amplitude and warping variation between repetition as random effects, which enables separation of the effects from the joint likelihood. Conventional approaches for warping model warping functions as fixed effects and do not contain amplitude effects [18]. The idea of modeling warping functions as random effects have previously been considered by [32][33][34] where warps were modeled as random shifts or random polynomials. None of these works however included amplitude variation. Recently, some works have considered models with random affine transformations for warps and amplitude variation in relation to growth curve analysis [35,36]. A generalization that does not require affine transformations for warp and amplitude variation is presented in [37]. The presented model (1) is a hierarchical generalization of the model presented in [37]. In the context of aligning image sequences in human movement analysis, morphable models [29,38] model an observed movement pattern as a linear combination of prototypical patterns using both nonlinear warping functions (estimated using DTW) and spatial shifts. Thus morphable models are similar to the warping approaches that model both warp and amplitude effects as fixed [39].
Maximum likelihood estimation of parameters. Model (1) has a considerable number of parameters, both for linear and nonlinear dependencies on the underlying state variable acceleration. The model also has effects that interact. This renders direct simultaneous likelihood estimation intractable. Instead we propose a scheme in which fixed effects and parameters are estimated and random effects are predicted iteratively on three different levels of modeling. Nonlinear model At the nonlinear level, we consider the original model (1), and simultaneously perform conditional likelihood estimation of the participant-specific warping functions and predict the random warping functions from the negative log posterior. All other parameters remain fixed.
Fixed warp model At the fixed warp level, we fix the participant-specific warping effect ν i at the conditional maximum likelihood estimate, and the random warping function v ij , at the predicted values. The resulting model is an approximate linear mixed-effects model with Gaussian random effects x ij and ε ij , that allows direct maximum-likelihood estimation of the remaining fixed effects, θ and φ i .

Linearized model
At the linearized level, we consider the first-order Taylor approximation of model (1) in the random warp v ij . This linearization is done around the estimate of ν i plus the given prediction of v ij from the nonlinear model. The result is again a linear mixedeffects model, for which one can compute the likelihood explicitly, while taking the uncertainty of all random effects-including the nonlinear effect v ij -into account. At this level all variance parameters are estimated using maximum-likelihood estimation.
The estimation/prediction procedure is inspired by the algorithmic framework proposed in [37]. The estimation procedure is, however, adapted to the hierarchical structure of data and refined in several respects. On the linearized model level, the nonlinear Gaussian random effects are approximated by linear combinations of correlated Gaussian variables around the mode of the nonlinear density. The linearization step thus corresponds to a Laplace approximation of the likelihood, and the quality of this approximation is approximately second order [40].
Let y ij be the vector of the m ij observations for participant i's jth replication of the given experimental condition, and let y i denote the concatenation of all functional observations of participant i in the experimental condition, and y the concatenation all these observations across participants. We denote the lengths of these vectors by m i and m. Furthermore, let σ 2 S ij , σ 2 S i and σ 2 S denote the covariance matrices of x ij = (x ij (t k )) k , x i = (x ij ) j , and x = (x i ) i respectively. We note that the index set for k depends on i and j since the covariance matrices S ij vary in size due to the different durations of the movements and because of possible missing values when markers are occluded.
We note that all random effects are scaled by the noise standard deviation σ. This parametrization is chosen because it simplifies the likelihood computations, as we shall see. Finally, we denote the norm induced by a full-rank covariance matrix A by kzk Fixed warp level. We model the underlying profile, θ, and the participant-specific variation around this trajectory, φ i , in the common (warped) functional basis Φ 2 R mÂK , with weights c = (c 1 , . . ., c K ) for θ and d i = (d i1 , . . ., d iK ) for φ i . We assume that the participant-specific variations, φ i , are centered around θ and thus P i d i ¼ 0 2 R K . Furthermore, the square magnitude of the weights, d i , is penalized with a weighting factor η. This penalization helps guiding the alignment process in the direction of the highest possible level of detail in the common profile θ when the initial alignment is poor.
For fixed warping functions ν i and v ij , the negative log likelihood function in θ = Fc is proportional to where I n denotes the n × n identity matrix. This yields the estimatê The penalized negative profile log likelihood for the weights d i for φ i is proportional to which gives the maximum likelihood estimator Nonlinear level. Similarly to the linear mixed-effects setting [41], it is natural to predict nonlinear random effects from the posterior [20], since these predictions correspond to the most likely values of the random effects given the observed data. Recall that the Gaussian variables, w ij , parametrize the randomness of the repetition-specific warping function v ij . Since the conditional negative (profile) log likelihood function in ν i given the random warping function v ij and the negative (profile) log posterior for w ij coincide, we propose to simultaneously estimate the fixed warping effects ν i and predict the random warping effects v ij from the joint conditional negative log likelihood/negative log posterior which is proportional to Since the variables w ij can be arbitrarily transformed through the choice of warping function v ij , the assumption that variables are Gaussian is merely one of computational convenience. Linearized level. We can write the local linearization of model (1) in the random warping parameters w ij around a given prediction w 0 ij as a vectorized linear mixed-effects model with effects given by where v 0 ij indicates that the warping function is evaluated at the prediction w 0 ij , diag(Z ij ) ij is the block diagonal matrix with the Z ij matrices along its diagonal, and denotes the Kronecker product.
Altogether, twice the negative profile log likelihood function for the linearized model (3) is Modeling of effects and algorithmic approach. So far, the model (1) has only been presented in a general sense. We now consider the specific modeling choices. The acceleration data has been rescaled using a common scaling for all experimental conditions, such that the span of data values has length 1 and the global timespan is the interval [0, 1].
To model the amplitude effects, we use a cubic B-spline basis F with K knots [42]. We require that the fixed warping function ν i is an increasing piecewise linear homeomorphism parametrized by n w equidistant anchor points in (0, 1), and assume that v ij is of the form where E ij ðtÞ is the linear interpolation at t of the values w ij placed at the n w anchor points in (0, 1). In the given experimental setting, the movement path is fixed at the onset and the end of the movement. The movement starts when the cylindrical object is lifted and ends when it is placed at its target position. Thus, we would expect the biggest variation in timing to be in the middle of the movement (in percentual time). These properties can be modeled by assuming that w ij is a discretely observed zero-drift Brownian bridge with scale σ 2 γ 2 [43], which means that the covariance matrix σ 2 C is given by point evaluation of the covariance function Cðt; t 0 Þ ¼ s 2 g 2 tð1 À t 0 Þ for t t 0 . When predicting the warps from the negative log posterior we restrict the search space to warps ν i and ν i + v ij that are increasing homeomorphic maps of the domain [0, 1] onto itself. The conditional distribution of ν ij given this restriction is slightly changed. For the used numbers of anchor points n w and the estimated variance parameters the difference is however minuscule, and we use the original Brownian model as a high-quality approximation of the true distribution.
We assume that the sample paths of the serially correlated effects x ij are continuous and that the process is stationary [44]. A natural choice of covariance is then the Matérn covariance with smoothness parameter μ, scale σ 2 τ 2 and range 1/α [45], since it offers a broad class of stationarity covariance functions.
Finally, in order to consistently penalize the participant-specific spline across experimental conditions with varying variance parameters, we will use penalization weights that are normalized with the variance of the amplitude effects, η = λ/(1 + τ 2 ).
The algorithm for doing inference in model (1) is outlined in Algorithm 1. We have found that i max = j max = 5 outer and inner loops are sufficient for convergence. A wide variety of these types of models can be fitted using the pavpop R package [22]. A short guide on model building and fitting is available as Supporting Information.
The number of basis functions K, the number of warping anchor points n w , and the regularization parameter λ were determined by the average 5-fold cross-validation score on each of three experimental conditions (d = 30 cm and obstacle heights S, M, and T). The models were fitted using the method described in the previous section, and the quality of the models was evaluated through the accuracy of classifying the participant from a given movement in the test set, using posterior distance between the sample and the combined estimates for the fixed effects (θ + φ i ) ν i . The cross-validation was done over a grid of the following values μ, μ p 2 {0.5, 1, 2}, K 2 {8, 13, 18, 23, 28, 33}, K p 2 {8, 9, . . ., 18}, n w , n wp 2 {0, 1, 2, 3, 5, 10}, λ, λ p 2 {0, 1, 2, 3}. The best values were found to be μ = 2, K = 23, n w = 2, λ = 2 and μ p = 1 K p = 12, n wp = 1, λ p = 0. We note that the smoothness parameter μ = 2 is on the boundary of the cross-validation grid. The qualitative difference between second order smoothness μ = 2 (corresponding to twice differentiable sample paths of amplitude effects) and higher order is so small, however, that we chose to ignore higher order smoothness. Furthermore, λ p = 0 indicates that we do not need to penalize participant-specific amplitude effects when working with percentual time. The reason is most likely that the samples have better initial alignment in percentual time. The estimated participant-specific acceleration profiles using percentual time can be seen in Fig 8. A simulation study that validates the method and implementation on data simulated using the maximum likelihood estimates of the central experimental condition (d = 30.0 cm, medium obstacle) is available as Supporting Information.

Identification of individual differences
A first assessment of the strength of the statistical model (1) is to examine the extent to which the model captures individual differences. Proper modeling of systematic individual differences is not only of scientific interest per se, but also provides perspective for interpreting any observed experimental effects. To validate the capability of the model to capture systematic individual differences, we use the model to identify an individual from the estimated individual templates. Such identification of individuals is becoming increasingly relevant also in a practical sense with the recent technological advances in motion tracking systems, and the growing array of digital sensors in handheld consumer electronics. Consistent with the framing of model (1), we perform identification of individual participants on the basis of the data from a single experimental condition. This is, in a sense, a conservative approach. Combining data across the different conditions of the experimental tasks would likely provide more discriminative power given that personal movement styles tend to be reproducible.
The classification of the movement data is based on the characteristic acceleration profile computed for each participant. For this to work it is important that individual movement differences are not smoothed away. The hyperparameters of the model were chosen with this requirement in mind. In the following, we describe alternative methods we considered. For all approaches, the stated parameters have been chosen by 5-fold cross-validation on the experimental conditions with obstacle distance d = 30.0 cm. The grids used for cross-validation are given as Supporting Information section. Recall that subscript p indicates the use of percentuall time. Nearest Participant (NP) NP classification classifies using the minimum combined pointwise L 2 distance to all samples for every individual in the training set.
Modified Band Median (MBM) MBM classification estimates templates using the modified band median proposed in [46], which under mild conditions is a consistent estimator of the underlying fixed amplitude effects warped according to the modified band medians of the warping functions. Classification is done using L 2 distance to the estimated templates. In the computations we count the number of bands defined by J = 4 curves [46, Section 2.2]. MBM p used J p = 2.
Robust Manifold Embedding (RME) RME classification estimates templates using the robust manifold embedding algorithm proposed in [47], which, assuming that data lies on a lowdimensional smooth manifold, approximates the geodesic distance and computes the empirical Fréchet median function. Classification is done using L 2 distance to the estimated templates.
Dynamic Time Warping (DTW) DTW classification estimates templates by iteratively time warping samples to the current estimated personal template (5 iterations per template) using an asymmetric step pattern (slopes between 0 and 2). The template is modeled by a Bspline with 33 degrees of freedom. DTW p used 16 degrees of freedom.

Fisher-Rao (FR) FR classification estimates templates as Karcher means under the Fisher-Rao
Riemannian metric [48] of the data represented using a single principal component [49].
Čencov's theorem states that the Fisher-Rao distance is the only distance that is preserved under warping [50], and in practice the distance is computed by using a dynamic time warping algorithm on the square-root slope functions of the data. Classification is done using L 2 distance to the estimated templates.
Elastic Fisher-Rao (FR E ) FR E classification estimates templates analogously to FR, but classifies using the weighted sum of elastic amplitude and phase distances [51, Definition 1 and Section 3.1]. The phase distance was weighted by a factor 1.5. FR Ep uses two principal components and a phase distance weight of 1.

Timing and Motion Separation (TMS)
The proposed TMS classification estimates templates of the fixed effects (θ + φ i ) ν i using Algorithm 1. Classification is done using least distance measured in the negative log posterior Eq (2) as a function of the test samples. The parameters were set as described in the previous section.
We evaluate classification accuracy using 5-fold cross-validation, which means that eight samples are available in the training set for every participant. The folds of the cross-validation are chosen chronologically, such that the first fold contains replications 1 and 2, the second contains 3 and 4 and so on. The results are available in Table 1. We see that TMS and TMS p achieve the highest classification rates, followed by FR Ep , FR p and RME p . Thus, the model enables identification of individual movement style. Furthermore, we note that there is little effect of using percentual time for the proposed method, which for all other methods gives a considerable boost in accuracy. This suggests that the TMS methods align data well without the initial linear warping and the endpoint constraints of percentual time.

Factor analysis of spatial movement paths
In the previous section, the proposed modeling framework was shown to give unequalled accuracy of modeling the time series data for acceleration. In this section we use the warping functions obtained to analyze the spatial movement paths and their dependence on task conditions. Temporal alignment of the spatial positions along the path for different repetitions and participants is necessary to avoid spurious spatial variance of the paths. The natural alignment of two movement paths is the one that matches their acceleration signatures. In other words, spatial positions along the paths at which similar accelerations are experienced should correspond to the same times. Thus, each individual spatial trajectory was aligned using the time warping predicted from the TMS p -results of the previous section. Every sample path was represented by 30 equidistant sample points in time at which values were obtained by fitting a three-dimensional B-splines with 10 equidistantly spaced knots to each trajectory.
As an exemplary study, we analyze how spatial paths depend on obstacle height. We do this separately for each distance, so that we perform five separate analyses, one for each obstacle placement. In each analysis we have 10 participants with 10 repetitions for each of 3 obstacle heights. The three-dimensional spatial positions along the movement path, y ijh 2 R 30Â3 , depend on participant i = 1,. . .,10, repetition j = 1,. . .,10 and height h = 1,2,3.
We are interested in understanding how the space-time structure of movement captured in the 30 by 3 dimensions of the trajectories, y ijh , varies when obstacle height is varied. We seek to find a low-dimensional affine subspace of the space-time representation of the movements within which movements vary, once properly aligned. That subspace provides a low-dimensional model of movement paths on the basis of which we can analyze the data.
We identify the low-dimensional subspace based on a novel factor analysis model. In analogy to principal component analysis (PCA), q so-called loadings are estimated that represent dominant patterns of variation along movement trajectories. In contrast to PCA, the factor analysis model does not only model the residual-variance of independent paths around the mean, but also allows one to include covariates from the experimental design, for example by taking the repetition structure of participants and systematic effects of obstacle height into account. In other words, the proposed factor analysis model is a generalization of PCA suitable for addressing the question at hand while obeying the study design.
The idea is to use the mean movement trajectory, θ 2 R 30Â3 , of one condition, the lowest obstacle height, as a reference. The movement trajectories y ijh (30 time steps and 3 Cartesian coordinates; participant i; repetition j; and experimental condition with height h) are then represented through their deviation from the reference path. We estimate the hypothesized lowdimensional affine subspace in which these deviations lie. That subspace is spanned by the q orthonormal (30 × 3)-dimensional columns of the loadings matrix W 2 R ð30Â3ÞÂq . We assume a mixed-effect structure on the weights for the loadings that takes into account both the categorical effect of obstacle-height and random effects of participant and repetition. This amounts to a statistical model where X h 2 R 1Â2 represents the covariate design that indicates obstacle heights: S: X 1 = (0,0); M: X 2 = (1,0); and T: X 3 = (0,1). The parameters, b 2 R 2Âq , are the weights for the loadings that account for the systematic deviation of obstacle heights, M and T from the reference height, S. g l is the factor that describes the lth level random effects design (participant, participants' reaction to obstacle-height change, and repetition). Z i;g l ðj;hÞ;l 2 R 1Âq are independent latent Gaussian variables with zero-mean and a covariance structure modeled with three q × q covariance matrices, each describing the covariance between loadings within a level of the random-effect design. ε ij 2 R 30Â3 is zero-mean Gaussian noise with diagonal covariance matrix Λ with one variance parameter per dimension. The loading matrix W is identifiable in a similar way as for usual PCA. Firstly, the scaling of W is identified by the assumption that W > W is the q-dimensional identity matrix. Secondly, the rotation of W is identified by the assumption that the total variance of the latent variables for a single curve is a diagonal matrix. This identifies the loading matrix W with probability 1.
The models were fitted using maximum likelihood estimation by using an ECM algorithm [52] that had been accelerated using the SQUAREM method [53]. Fig 9 shows the percentage of explained variance for various values of q across different obstacle distances. For q > 8, the average percentage of variance explained by the ninth loading ranges from 1% to 2% and the combined percentage of variance explained by the loadings beyond number eight remains under 3%. From this perspective, q = 8 seems like a reasonable choice, with the loadings explaining 97.1% of the variance, meaning that the error term ε ij should account for the remaining 2.9%. In the following, all results are based on the model with q = 8.
Modeling obstacle height. The fitted mean trajectories for the three different obstacle heights at distance d = 30.0 cm can be found in Fig 10. A striking feature of the mean paths is the apparent linear scaling of elevation, but also of the lateral excursion with height. (The difference in the frontal plane, not shown, was very small, but follows a similar pattern). This leads to the hypothesis that the scaling of the mean trajectory with obstacle height can be described by a one-parameter regression model in height increase (X 1 = 0, X 2 = 7.5, X 3 = 15) rather than a more generic two-parameter ANOVA model.
We fitted both models for every obstacle distance and performed likelihood-ratio tests. The p-values can be found in Table 2. They were obtained by evaluating twice the difference in log likelihood for the two models using a χ 2 -distribution at q = 8 degrees of freedom. We see that no p-values are significant, so there is no significant loss in the descriptive power of the linear   scaling model compared to the ANOVA model. The remaining results in the paper are all based on the model with a regression design.
Discovering the time-structure of variance along movement trajectory. A strength of our approach to time warping is that we can estimate variability more reliably. In addition to observation noise, we model three sources of variation in the observed movement trajectories: individual differences in the trajectory, individual differences caused by changing obstacle height, and variation from repetition to repetition. The variances described from these three sources are independent of obstacle height. In Figs 11 and 12, two spatial representations of the mean movement path for the medium obstacle height are shown. The five distances of the obstacle from the starting position are shown in the five rows of the figures. The three columns show variance originating from individual differences in the trajectory, individual differences caused by changing obstacle height, and variation from repetition to repetition (from left to right). Variance is illustrated at eight equidistant points in time along the mean path by ellipsoids that mark 95% prediction for each level of variation.
Note the asymmetry of the movements with respect to obstacle position, both in terms of path and variation. This asymmetry reflects the direction of the movement. Generally, variability is higher in the middle of the movement than early and late in the movement. Individual differences caused by change in obstacle height (middle column) are small and lie primarily along the path. That is, individuals adapt the timing of the movement differently as height is varied. Individual differences in the movement path itself (left column) are largely differences in movement parameters: individuals differ in the maximal elevation and in the lateral positioning of their paths, not as much in the time structure of the movements. Variance from trial to trial (right column) is more evenly distributed, but is largest along the path reflecting variation in timing.
These descriptions are corroborated by the comparisons of the amounts of variance explained by the three effects in Fig 13. The obstacle distance of 45, in which the obstacle is close to the target lead to the largest variance in movement trajectory, with most of the increase over other conditions coming from repetition and individual differences caused by change in obstacle height. This suggests that this experimental condition is more difficult than the others, and perhaps much more so for the tall obstacle. Apart from this condition, we see that the largest source of variation is individual differences in movement trajectory. The second largest source of variation is repetition. Individual differences caused by change in obstacle height were are systematically the smallest source of variance.
Trajectory focal points. A final demonstration of the strength of the method of analysis is illustrated in Fig 14, which shows the mean paths for all obstacle distances and all obstacle heights. For each obstacle height, the paths from different obstacle distances intersect both in the frontal and the vertical plane. These focal points occur approximately at the same distance along the imagined line connecting start and end position. This pattern is clearly visible in the front view of the mean trajectories in Fig 14. Due to the limited variation of the path in the horizontal plane (Fig 14 top view), this effect is less clear in the horizontal plane. This pattern may reflect a scaling law, a form of invariance of an underlying path generation mechanism.

Discussion
We have proposed a statistical framework for the modeling of human movement data. The hierarchical nonlinear mixed-effects model systematically decomposes movements into a common effect that reflects the variation of movement variables with time during the movement, individual effects, that reflect individual differences, variation from trial to trial, as well as measurement noise. The model amounts to a nonlinear time-warping approach that treats all sources of variances simultaneously. We have outlined a method for performing maximum likelihood estimation of the model parameters, and demonstrated the approach by analyzing a set of human movement data on the basis of acceleration profiles in arm movements with obstacle avoidance. The quality of the estimates was evaluated in a classification task, in which our model was better able to determine if a sample movement came from a particular participant compared to state-of-the-art template-based curve classification methods. These results indicate that the templates that emerge from our nonlinear warping procedure are both more consistent and richer in detail.
We used the nonlinear time warping obtained from the acceleration profiles to analyze the spatial movement trajectories and their dependence on task conditions, here the dependence on obstacle height and obstacle placement along the path. We discovered that the warped movement path scales linearly with increasing obstacle height. Furthermore, we separated the variation around the mean paths into three levels: individual differences of movement trajectory, individual differences caused by change in obstacle height, and trial to trial variability. This combination of models uncovered clear and coherent patterns in the structure of variance. Individual differences in trajectory and variance from trial to trial were the largest sources of  variance, with individual differences being primarily at the level of movement parameters such as elevation and lateral extent of the movement while variance from trial to trial contained a larger amount of timing variance. We documented a remarkable property of the movement paths when obstacle distance along the path is varied at fixed obstacle height: all paths intersect at a single point in space.  We believe that the approach we describe enhances the power of time series analysis as demonstrated in human movement data. The nonlinear time warping procedure makes it possible to obtain reliable estimates of variance along the movement trajectories and is strong in extracting individual differences. This advantage can be leveraged by combining the nonlinear time warping with factor analysis to extract systematic dependencies of movements on task conditions at the same time as tracking individual differences, both base-line and with respect to the dependence on task conditions, as well as variance across repetitions of the movement.
Recent theoretical accounts have used the analysis of variance across repetitions of movements to uncover coordination among the many degrees of freedom of human movement systems [17]. Differences in variance between the subspace that keeps hypothesized relevant task variables invariant and the subspace within which such task variables vary support hypotheses about the task-dependent structure of the underlying control systems. Because variance is modulated in time differently across the two subspaces, a more principled decomposition of time dependence and variance from trial to trial would give such analyses new strength. Because this application requires the extension of the proposed method to multivariate time series, it is beyond the scope of this paper. Together with the considerable practical interest in identifying individual differences, these theoretical developments underscore that the method proposed here is timely and worth the methodological investment.
Supporting Information S1 Text. Supporting information contains a primer on model building using the proposed framework, an example analysis with accompanying R code, a simulation study that evaluates the proposed algorithm for maximum likelihood estimation, and the different parameter grids used for cross-validation in the paper.