Noise Induces Biased Estimation of the Correction Gain

The detection of an error in the motor output and the correction in the next movement are critical components of any form of motor learning. Accordingly, a variety of iterative learning models have assumed that a fraction of the error is adjusted in the next trial. This critical fraction, the correction gain, learning rate, or feedback gain, has been frequently estimated via least-square regression of the obtained data set. Such data contain not only the inevitable noise from motor execution, but also noise from measurement. It is generally assumed that this noise averages out with large data sets and does not affect the parameter estimation. This study demonstrates that this is not the case and that in the presence of noise the conventional estimate of the correction gain has a significant bias, even with the simplest model. Furthermore, this bias does not decrease with increasing length of the data set. This study reveals this limitation of current system identification methods and proposes a new method that overcomes this limitation. We derive an analytical form of the bias from a simple regression method (Yule-Walker) and develop an improved identification method. This bias is discussed as one of other examples for how the dynamics of noise can introduce significant distortions in data analysis.


Introduction
Error detection and correction is one of the most fundamental elements in motor learning, in fact in almost every type of learning. In engineering, the notion that the error in the current execution is used to improve the next execution is the essence of any controller design for systems that perform tasks repeatedly [1][2][3]. In human learning, it is universally agreed upon that the perception and correction of errors plays a significant role in the process of motor learning [4,5], although the neural and cognitive mechanisms underlying error-based learning are not yet fully understood [6,7]. Feedback of error, and knowledge of results or performance have long been shown to facilitate and accelerate acquisition of new motor skills [8]. Based on this understanding, motor learning can be enhanced by magnifying the perceived error [9][10][11][12]. Errors are not only instrumental during early practice, but smaller corrections also continue into later stages of skilled performance. Prior studies highlighted the critical role of error corrections during the repeated execution of goal-directed movements near the goal [13][14][15]. In fact, expert performance at a highly skilled level implies that any small error is readily compensated for.
While ubiquitous, it remains an open question to what degree each error is corrected for. Should a learner completely compensate for each error in the subsequent attempt, or should he/she be more conservative and only correct a smaller portion of the error? If random noise is part of the observed fluctuations, then complete error cancellation may lead to an overshoot and ultimately unstable performance. As errors become smaller with practice, should this proportion or "gain" of the correction change? With these questions, the sensitivity of error correction, the gain or learning rate, has been a key variable in several areas of research.
With the goal to shed light on error-based learning, various mathematical models have been used to quantify detection and correction of error. For example, Galea et al. used a simple deterministic model to reveal differences in the learning rate under positive and negative reward in a motor adaptation task [16]. Smith et al. introduced a learning model with error feedback on two different time scales that reproduced several features of human motor adaptation, including savings [17]. However, neither of the two models considered noise, which is inevitable and, arguably, even beneficial to motor learning [18][19][20][21]. Heeding to the critical role of random fluctuations, Scheidt et al. proposed an exponential model for learning of goaldirected reaching under random perturbations [22]. Baddeley et al. compared a series of simple models that included noise sources to find the best fit for human visuo-motor performance; results suggested that exponential weighting of recent errors may best account for the high efficacy of human performance [23]. Diedrichsen et al. also used a stochastic state-space model of trial-by-trial adaptation and suggested a dissociation between errors that guide behavioral goals and those that guide neural adaptation of internal models [24]. Several other studies used stochastic iterative models to address how error corrections are associated with a structural change of variability using skills ranging from simple reaching [13,25] to throwing a ball [14].
The most critical parameter in all these learning models is the correction gain, learning rate or feedback gain, which determines how much of the error in the current trial is corrected in the following trial. The simplest form of an iterative learning model that includes a correction gain and noise can be described by a set of three linear equations [13]: Eq 1 defines the motor output, x at the i th trial as a sum of the planned execution m and noise N; the latter is a random vector drawn from a zero mean distribution with a known covariance matrix. Most learning models assume that this noise comes from a Gaussian distribution. Eq 2 defines the error as the difference between the output and the target. Eq 3 models the learning process as an update of the planned execution based on the error information. The constant B is the correction gain, which defines the fraction of the error that is adjusted in the next trial.
Conventionally, estimation of this critical parameter B in a given time series of data has relied on a convenient and common mathematical tool-the method of least-squares regression. Different parameter optimization algorithms include the Gauss-Newton algorithm, gradient descent, and the Levenberg-Marquardt algorithm [26][27][28][29]. These model-based methods have been applied regardless of possible noise in the model or in the measured data. Implicitly or explicitly, these approaches relied on the assumption that the effect of noise averages out. If noise is explicitly taken into account using stochastic models, a widely-used method is the Expectation-Maximization (EM) algorithm [30]. This algorithm estimates the model parameters including the noise via maximum likelihood estimation [31,32]. However, this method is confined to models with independent noise sources and cannot be applied to auto-regressive processes [33].
The present study closely examined the viability of least-square methods for estimating model parameters, scrutinizing the common assumption that noise averages out. As we will demonstrate, the subtle dynamics of noise significantly influence the estimates of the correction gain and produce a bias. This bias may be relatively small when the difference between initial and final motor output is by orders of magnitude larger than the noise variance, as is the case at initial stages of learning. However, when the learning process approaches steady-state, or the error is comparable with the variability of the motor output, the bias in the estimated correction gain can become significant.
We devise a new method that improves the accuracy of identification of the correction gain, for the case when motor learning is close to steady state and the error is of similar magnitude as the overall variability. Analytical approaches and numerical simulations show that the quantified bias and the adjustment by the new method are robust and are insensitive to the magnitude and the distribution of the noise.

A Simple Iterative Model
To exemplify the problem of least-square estimation of the correction gain and demonstrate the bias of the conventional estimate, we examined a simple model that represents the basic structure of a large number of learning models [13,14,[23][24][25]. For the simplest one-dimensional case, the vector variables in Eqs 1 to 3 become scalars. Defining the origin of the coordinate as the desired motor output, or the target state (x target = 0), where N i is a random variable from a distribution with zero mean and a standard deviation of σ N , and B is assumed to satisfy 0 < B < 1. From Eqs 4, 5 and 6, Therefore, the learning process can be described by a simple linear equation: In this auto-regressive form, least-square regression can be used to estimate B from a time series of measured data.

Testing the Validity of the Conventional Least-Square Regression
To test whether the least-square estimation extracts the real parameters with sufficient accuracy, the first step was to generate a set of time series {x k } using the simple model of Eq 7 with a known value of B and random noise samples. The lengths of the time series n varied from 10 to 800. To extract the correction gain and the variance of the noise, Eq 7 was analyzed using the standard regression approach. We opted to use the Levenberg-Marquardt algorithm [28] among many other alternative least-square algorithms because this algorithm is an effective numerical method for regression. It has therefore been a common choice for least-squares problems in commercialized numerical computing environments including Matlab (Mathworks Inc., Natick, MA). The Levenberg-Marquardt algorithm has also been widely applied in motor learning studies [34,35] and for the training of neural networks [36][37][38].
We generated time series with three different B values of 0.25, 0.50, and 0.75. As little is known about the origin and magnitude of the noise in actual data, we also investigated the effect of the noise distribution with three types of distributions-normal, uniform, and asymmetric lognormal. In addition, we simulated different magnitudes of the noise level σ N . The probability density functions of the added noise are demonstrated in Fig 1. The simulation and subsequent parameter estimation were repeated 1000 times for each parameter set and time series length n. The difference between the true B and the mean of the 1000 estimated valuesB was evaluated for each noise distribution, noise level, and n. Numerical simulations and analyses were implemented in Matlab (Mathworks Inc., Natick, MA).

Quantification of the Bias and Development of Improved Estimation: Adjusted Yule-Walker (AYW) Method
Following the demonstration of a significant bias as reported in the results, we proceeded to show that this bias can be removed if an analytical expression of the bias is obtained. To attain a closed form of the bias, we considered one of the simplest estimators, the Yule-Walker algorithm [39,40]. Note that the Yule-Walker estimation method does not reduce the bias by itself. However, it lends itself to obtain an analytical expression of the bias due to its simple structure. The same analytical derivation might be done for the commonly used Levenberg-Marquardt algorithm, but would be more challenging and less transparent. Using the derived correction term, this bias can then be eliminated, which significantly improves the accuracy of the parameter estimation.
For any autoregressive (AR) process, the general Yule-Walker method calculates the AR parameters. The algebraic expression of the simplest AR process of order one is where N i is a random variable. The Yule-Walker equation estimates the parameter A as which is identical to the least-square linear regression. Note that Eq 8, which is a first-order AR process, has a similar structure as the simple learning model in Eq 7. The assumption that the noise in each iteration is an independent random sample allowed to approximate the expected bias of the Yule-Walker estimation in a simple closed form. Using the closed form of the bias, we could then quantify the correction gain more accurately. We called the procedure the Adjusted Yule-Walker method (AYW). We verified the reliability of this AYW method in the same way as we tested the reliability of the conventional least-square method: with a fixed value of the correction gain B, we constructed a set of time series {x k } with added noise; the length of the time series n varied from 10 to 800. As above, the simulations and estimation were performed with three different noise distributions and three different noise levels (Fig 1). The simulation was repeated 1000 times for each n and each parameter set. The correction gain B was estimated by the AYW method, and the difference between the true B value and the mean of the 1000 estimated valueŝ B was evaluated for each noise distribution, noise level, and n. The entire procedure is overviewed in Fig 2.

Demonstration of the Bias from Least-Square Estimation
The conventional Levenberg-Marquardt (LM) algorithm based on the least-square method yielded a substantial bias. Fig 3 shows the B values for different time series length n estimated with the LM-algorithm (red) and the AYW method (blue) against the real B value (green). The estimated correction gainsB had higher values than the actual correction gains B. Table 1 summarizes and compares the magnitude of the biases when the correction gain was estimated by the two methods. To facilitate comparison, a normalized error was calculated as ðB avg À BÞ=B Â 100 ð%Þ, whereB avg is the mean of the 1000 estimates, and B is the actual correction gain. Depending on the true value of B, the bias from the LM estimates was at 161 70% of B, even when the length of the time series n reached 800. The biases in the AYW method became less than 5% as n reached 800. Further, in Fig 3, the panels A to C show that the significant bias did not change much whether the noise came from a normal, uniform, or asymmetric lognormal distribution. Panel D in Fig 3 also shows that the bias was largely unaffected by the level of the noise.
The biases were calculated for each type of noise distribution (normal, uniform, or lognormal) and each trial length in a normalized form of ðB avg À BÞ=B Â 100 ð%Þ, whereB avg is the mean of the 1000 estimates, and B is the actual value of the correction gain. Note that the conventional least-square method with the Levenberg-Marquardt algorithm yielded biases of 16~170%, even with a large trial number. The biases with the AYW method became less than 5% as n reached 800.   We analytically derived the bias in the estimates of the correction gain using the Yule-Walker equation [39,40], one of the simplest regression methods. The notation of the one-dimensional learning process of Eq 7 is simplified by substituting (1 -B) with C, where N i is a random variable from a distribution with zero mean and the standard deviation of σ N , and C is between 0 and 1 because 0 < B < 1. To illustrate, To warrant using the Yule-Walker equation, or Eq 9, we regard N i+1 -N i in Eq 8 as another single noise term, such that the coefficient C can be estimated as From Eq 11, the expectation of x n is a weighted sum of the expectation of N i , each of which is zero. Therefore, By definition, N p and N q are independent when p 6 ¼ q. Therefore, where f is the probability density function of the noise. Note that the validity of Eq 14 does not depend on the specific property of the distribution. Eq 14 remains valid, regardless of whether the probability density function f is symmetric like a normal and a uniform distribution, or asymmetric like a lognormal distribution. From Eq 11, it follows that By Eq 14, terms with N p N q (p 6 ¼ q) do not contribute to the expectation. In addition, by the definition of N i , the expectation of N i 2 is σ N 2 . Therefore, The common ratio of C 2 is between 0 and 1. Therefore, as n goes to infinity, Therefore, using Eq 13, Assuming large enough n, the denominator of Eq 12 approximately becomes the variance of x, or σ x 2 multiplied by (n-1), or The variance of x can also be simply obtained as a stable solution of From Eq 11, x i is a weighted sum of N 1 , N 2 , . . ., and N i . Therefore, using Eq 14, the terms with N i+1 N i and x i N i+1 do not contribute to the expectation E[.] in Eq 17. In addition, as only N i 2 in the x i N i contributes to the expectation, Eq 17 becomes By definition, the expectation of N i 2 is σ N 2 . Therefore, the expectation of x 2 , or σ x 2 satisfies which is same as Eq 16. Then, the expectation of C YW becomes The numerator of Eq 19 becomes From Eq 11, x nÀ1 x n ¼ ðC À 1Þ 2 C 2nÀ5 N 1 2 þ ðC À 1Þ 2 C 2nÀ7 N 2 2 þ Á Á Á þ ðC À 1ÞN nÀ1 2 þ terms with N p N q ðp 6 ¼ qÞ: By Eq 14, only terms with N i 2 contribute to the expectation E [.] in Eq 20, whereas terms with N p N q (p 6 ¼ q) do not contribute to the expectation. Therefore, By definition of N i , the expectation of N i 2 is σ N 2 . Therefore, Eq 21 becomes Eq 22 can be re-written as The sum of the geometric sequence becomes Therefore, Eq 23 becomes Noise Induces Biased Estimation of the Correction Gain From Eqs 19 and 24, This expression shows that the bias is hardly reduced, even when n is large. Assuming large n, Note that the expectation of the estimate of C by the Yule-Walker method for large n is negative because 0 < C < 1. This means that the estimation is severely biased, if we use the Yule-Walker algorithm to measure the correction gain of the learning model. For a general autoregressive (AR) process, the Yule-Walker equation shows a finite bias [41,42], but not to the same extent. The substantial bias in this case originated from the anti-correlated noise added to the linear relation between x i and x i+1 . This contrasts with the added noise in the general AR process that has no correlation.

Equation for the Improved Estimation
From Eq 25, for large enough n, the expectation of C can be written as E½C YW ffi ðC 2 À 1Þ 2 þ ð1 À CÞC 2ðn À 1Þ ðn À 2Þ; or From Eq 26, the term 1 + 2E[C YW ] is positive. Therefore, the quadratic equation (Eq 27) has one positive root and one negative root. We are interested in the positive root, and the coefficient C is estimated as Accordingly, the true correction gain B can be approximated as Simulation confirmed that Eq 29, the Adjusted Yule Walker or AYW method improves the estimate. Fig 3 and Table 1 show that the bias was significantly reduced by the AYW method to less than 5% as n increased. This improvement was neither affected by the type of the noise distribution, irrespective of whether the noise came from a normal, a uniform, or an asymmetric lognormal distribution. It was also unaffected by the magnitude of the noise level σ N .
If the actual noise variance is of interest, the time series of N i can also be obtained in the following manner. The motor output x i is directly measured. After C is identified from Eq 28, the time series of (N i+1 -N i ) can be computed from Eq 10. With an arbitrarily assumed value of N 1 , N i can be obtained. Then, N 1 and N i can be recomputed so that N i has zero mean.

Discussion
A variety of studies on error-based adaptation and learning in computational neuroscience have regarded the correction gains of the learning process as an essential piece of information. These correction gains, feedback gains, or learning rates have been typically identified by leastsquare curve fitting, frequently using the widely available Levenberg-Marquardt algorithm. These methods were based on the premise that the effect of noise averages out by the regression. Indeed, the least-square curve fitting provides a reasonably accurate measure of the correction gain when the performance changes are considerably higher than the amplitude of the noise processes. However, later in learning when performance approaches a plateau, we showed that the noise in the system induces a substantial bias in the estimate of the correction gain, when relying on conventional least-square regression. Using a simple autoregressive learning model, we then devised an improved method that quantifies and corrects the bias.

Necessity of Improved Estimation Method
Our simulation results demonstrate that the conventional methods based on regression always overestimate the correction gain. The noise induces uncertainty (i.e., non-zero variance in the estimation) as well as a clear bias in the mean or expected value. As shown in Fig 3 and Table 1, the bias can be substantial, even in the simplest case of a learning model. Furthermore, the bias was not reduced by increasing the trial number n in the estimates by the Levenberg-Marquardt least-square algorithm. The subsequent analytical derivation proved that the bias from a linear regression method, such as the Yule-Walker equation, converges to a non-zero value as n approaches infinity, as shown in Eq 26. The same behavior is the likely cause for the observed bias in the Levenberg-Marquardt estimates, evidenced in our simulations. This robustness of the bias against the data size highlights the serious limitation of the conventional estimation of the correction gain.
A possible alternative to the least-square regression methods is the Expectation Maximization (EM) algorithm that has been used to identify parameters of a stochastic model from the observed data [43,44]. The EM algorithm can reduce the bias due to noise with long enough time series, but it cannot estimate model parameters that are assumed to be deterministic [33]. Consequently, for the simple iterative learning process as in Eq 7, which is a kind of Autoregressive Moving Average (ARMA) model, the parameter B cannot be estimated by the EM algorithm [33]. Hence, other methods need to be identified.

Insensitivity of Bias to Noise Magnitude and Distribution
The results show that the bias due to conventional estimates does not depend on the magnitude of the noise in the data. This robustness of the bias against the noise magnitude further underlines the importance of the bias reduction, suggesting that the bias needs to be adjusted, even if the noise magnitude is relatively low.
Another important observation is that the bias is insensitive to the noise distribution. Most of mathematical learning models have typically assumed white Gaussian noise, which is mathematically well-defined and convenient to analyze. However, several studies convincingly suggest that the noise distribution in biological systems is closer to an asymmetric lognormal distribution rather than a Gaussian distribution [45][46][47][48]. The insensitivity of bias to the noise distribution enables the AYW method to provide an improved estimation without relying on the common but vulnerable assumption of most learning models.
As the bias is induced by noise, it may appear necessary to identify the noise properties in the data set to assess and eliminate the bias. However, our analytical analysis shows that we do not need detailed knowledge of the noise properties to estimate the bias. Eq 25 demonstrates that the bias does not depend on the noise level σ N . In addition, we did not assume any specific distribution of the noise to derive the closed form of the bias; we only assumed that the newly added noise in the current trial is independent of the noise in the previous trial. This implies that we can apply the same method for a variety of systems with different levels and distributions of noise. For example, we can use the same AYW method to estimate the correction gain of young and healthy subjects or patients who may have different types and degrees of noise.

Considerations and Limitations
While the noise distribution is of no concern, our analysis assumes that the noise samples added to each trial have no correlation. While the assumption of uncorrelated noise in human movement has been generally accepted in learning models and is supported by some experimental observations [45], other studies have also shown significant correlations, positive and negative, in the variability of the motor output. For example, time series of stride intervals in human walking and heart beats exhibit long-range correlations, suggesting that the variability in those motor behaviors cannot be modeled as white noise [49,50]. Hence, caution is necessary when applying the AYW method to a system with correlated or anti-correlated noise.
However, we need to distinguish the variability of the observed motor output from the assumed noise that causes the variability. In Eq 7, the variability of x may exhibit correlation, even if N has no correlation. Actually, a recent study showed that the long-range correlations in stride intervals of human walking and other rhythmic motor behavior may be explained by uncorrelated noise filtered by stable rhythmic dynamics of the sensorimotor system [51]. The assumption that the original unfiltered source of variability is white noise is also common and highly effective in control and signal processing theory as well as in computational biology. It is also the basis of the widely-applicable Kalman filter [52]. Hence, the observed correlated variability in the motor output does not invalidate the AYW method, which assumes uncorrelated noise.
The AYW method significantly reduces the bias when the length of the time series is sufficiently long (n ! 400). However, the accuracy of the AYW method is relatively low when the length of the time series is short (n 50), although the bias is always smaller than the bias in the least-square method (see Table 1). This inaccuracy with insufficient data is due to the fact that the bias reduction by the AYW method depends on Eq 16, which assumes large n. Future work may resolve this inaccuracy when only insufficient data are available.
While the AYW method provides a less biased estimation, it shows higher variability than the least-square method, particularly when the size of the data is small. This high variability may limit the efficacy of the AYW method when a distinction between correction gains of two groups is required and only a small data set is available. This trade-off between bias and variance has already been identified in other estimation methods and discussed in prior studies [53,54]. Development of a new method that optimizes the efficacy considering the trade-off between bias and variance is deferred to future work.
Like many other studies, our approach assumed that the correction gain or the learning rate is constant. This assumption, though widely used, may not be strictly accurate; the learning rate is approximately constant for small errors, but may change when the errors become larger [55,56]. This study addresses the bias of the correction gain estimation when the learning process approaches skilled levels or steady state, where small errors are prevalent. Therefore, the assumption of a constant correction gain is sufficiently valid within the scope of this study.
For initial study, our analysis used a highly simplified one-dimensional model of motor learning, and therefore, generalization to other models and multi-dimensional cases will require further considerations. For example, for rapid pointing movements, van Beers showed that this simple model was insufficient to account for the observed structure of variability [13]. However, our chosen model contains the essential components of motor learning: feedback through error information and unplanned variability due to noise. This basic simple learning model was used to clearly demonstrate the influence of noise on estimation.

A Representative Problem for the Influence of Noise on Data Analysis
The observed bias is an example for how the dynamics of noise can introduce significant distortions in the analysis. It is widely assumed that noise is "neutral" if it has zero mean, as the effect of noise averages out with a large amount of data. However, the accumulated effect of noise may not always remain neutral, if the noise is processed by the analysis, such as in the conventional least-square curve fitting. Another non-intuitive effect of noise on estimation was highlighted in a recent study on Floquet multipliers, a method for assessing orbital stability of rhythmic movements. Noise induces a bias in the estimation of orbital stability [57]. Another example for the surprising effect of noise is the two-thirds power law that is widely observed in human movements [58][59][60][61]. It has been shown that this subtle relation between velocity and curvature can be generated by Gaussian noise alone [62]. Hence, experimental assessment and interpretation of the power law as a fundamental principle of movement generation needs caution.
Noise is ubiquitous and deserves more attention as its dynamics may influence the estimation of the "deterministic" elements of the system. Our study is only one example to show that the common belief that the accumulated effect of noise is neutral needs revision. More work is needed to examine more complex learning models and the estimation of noise processes from the data.