^{*}

Analyzed the data: MH. Contributed reagents/materials/analysis tools: MH. Wrote the paper: MH DN. Conceived and designed the simulations: MH. Performed the simulations: MH. Mathematically proved the convergence of the equations: MH DN.

The authors have declared that no competing interests exist.

Recent theoretical studies have proposed that the redundant motor system in humans achieves well-organized stereotypical movements by minimizing motor effort cost and motor error. However, it is unclear how this optimization process is implemented in the brain, presumably because conventional schemes have assumed a priori that the brain somehow constructs the optimal motor command, and largely ignored the underlying trial-by-trial learning process. In contrast, recent studies focusing on the trial-by-trial modification of motor commands based on error information suggested that forgetting (i.e., memory decay), which is usually considered as an inconvenient factor in motor learning, plays an important role in minimizing the motor effort cost. Here, we examine whether trial-by-trial error-feedback learning with slight forgetting could minimize the motor effort and error in a highly redundant neural network for sensorimotor transformation and whether it could predict the stereotypical activation patterns observed in primary motor cortex (M1) neurons. First, using a simple linear neural network model, we theoretically demonstrated that: 1) this algorithm consistently leads the neural network to converge at a unique optimal state; 2) the biomechanical properties of the musculoskeletal system necessarily determine the distribution of the preferred directions (PD; the direction in which the neuron is maximally active) of M1 neurons; and 3) the bias of the PDs is steadily formed during the minimization of the motor effort. Furthermore, using a non-linear network model with realistic musculoskeletal data, we demonstrated numerically that this algorithm could consistently reproduce the PD distribution observed in various motor tasks, including two-dimensional isometric torque production, two-dimensional reaching, and even three-dimensional reaching tasks. These results may suggest that slight forgetting in the sensorimotor transformation network is responsible for solving the redundancy problem in motor control.

It is thought that the brain can optimize motor commands to produce efficient movements; however, it is unknown how this optimization process is implemented in the brain. Here we examine a biologically plausible hypothesis in which slight forgetting in the motor learning process plays an important role in the optimization process. Using a neural network model for motor learning, we initially theoretically demonstrated that motor learning with a slight forgetting factor consistently led the network to converge at an optimal state. In addition, by applying the forgetting scheme to a more sophisticated neural network model with realistic musculoskeletal data, we showed that the model could account for the reported stereotypical activity patterns of muscles and motor cortex neurons in various motor tasks. Our results support the hypothesis that slight forgetting, which is conventionally considered to diminish motor learning performance, plays a crucial role in the optimization process of the redundant motor system.

The motor system exhibits tremendous redundancy

The hypothesis that the brain selects a solution that minimizes the cost of movement has long been proposed

It should be noted that these conventional optimization studies tacitly assume that the brain somehow constructs a motor command that theoretically minimizes the cost function, and largely ignored the underlying trial-by-trial learning process

However, it is unknown whether the decay algorithm could minimize the cost (

To gain insight into these mechanisms, we conducted computer simulations of motor learning by applying the “feedback-with-decay” algorithm to a redundant neural network model for sensorimotor transformation. First, we used a simple linear model to gain a firm theoretical understanding of the effect of the decay on the minimization of the cost (

As a simple example of a redundant motor task, we considered a task that requires the production of torque in a two-joint system with redundant actuators (

(^{st} and 3^{rd} quadrants, ^{nd} and 4^{th} quadrants, which is orthogonal to the distribution of the MDVs.

First, we considered the case where the synaptic weights are solely modified to reduce the error, according to the following equation:_{e}_{e} = 1/2^{T}^{th} trial (^{th} trial,

Trial-dependent changes in the magnitude of error (^{th} and 40,000^{th} trial for each simulation are shown as polar histograms. The horizontal black line indicates the optimal value calculated analytically as the pseudo-inverse matrix of

However, the situation was considerably different when modification of the synaptic weights based on error feedback was not perfect, but incorporated ^{−4}). In this model, the sum of the squared neural activity converged at an optimal value regardless of the initial synaptic weight (_{ij}

In mathematical terms, the modification of the synaptic weights based on the feedback-with-decay rule (Eq. (3)) is similar to the gradient descent rule for minimizing the cost function _{e}_{m}

Furthermore, we have also proven that the synaptic weight matrix (^{st} and 3^{rd} quadrants, the distribution of the converged PDVs should be biased toward the 2^{nd} and 4^{th} quadrants (

The above results indicate three important points regarding the “feedback-with-decay” rule. First, the optimal solution can be obtained using only

Another interesting observation regarding the formation of the bias of the PDs is that when the initial synaptic weight is relatively small (see cyan trace in ^{T}^{T}^{−1}) (condition #4), the converged PD bias is dominated by the PD bias of the pseudo-inverse. Thus, if conditions #2 and #4 are satisfied, even the “feedback-only” rule can predict the approximate direction of the optimal PD bias, even though the converged synaptic weight matrix is not optimal.

In summary, in the linear neural network model, the “feedback-with-decay” rule consistently leads to the optimal synaptic weight and the optimal PD bias, whereas the “feedback-only” rule only predicts the approximate direction of the optimal PD bias in limited conditions.

Next, we examined whether these aspects hold true in non-linear neural network models that additionally include a muscle layer whose activity (^{nd} neural layer consists of corticospinal neurons in M1; however, since M1 actually includes inhibitory interneurons, the layer cannot be regarded as a real M1. Nevertheless, we modeled the neural network incorporating the properties of actual M1 neurons to gain an insight into how the corticospinal neurons are recruited under the feedback-with-decay rule.

(^{nd} layer of 1000 neurons, a 3^{rd} layer of 8 muscle groups at the shoulder and elbow joints, and an output layer. (^{nd} layer. (

Firstly, each corticospinal neuron receives the desired movement parameters from the input layer and their firing rate obeys cosine tuning

First, we simulated the isometric torque production task with a two-joint system (shoulder and elbow) conducted by Herter et al. ^{st} and 3^{rd} quadrants (^{nd} layer randomly innervate these muscles led to a biased distribution of the neuronal MDVs toward the same quadrants (^{nd} layer were bimodally distributed toward the 2^{nd} and 4^{th} quadrants (

Trial-dependent changes in the magnitude of error (^{th} and 40,000^{th} trials for each simulation are shown as polar histograms. (^{th} and 40,000^{th} trials for simulations represented by the cyan and red traces in

Interestingly, the predicted PD distribution (

Thus, error-based learning with slight forgetting seems to predict the non-uniform PD distribution of M1 neurons; however, what happens if forgetting is not slight? Theoretical considerations suggest that a relatively larger decay rate led to the system assigning much more weight to minimize the motor effort cost (_{m}) than the error cost (_{e}).

Next, we examined whether the weight decay rule can predict the characteristic bias of the PD distribution of M1 neurons observed during the reaction time period before reaching movements. Since the activity of M1 neurons just before reach initiation would reflect the activity necessary to produce the initial acceleration, we focused on the initial ballistic phase of a reaching movement. To mimic the initial phase, we modified the network by replacing the “desired torque” in

(

(

First, we simulated the reaching task with a two-joint system in a horizontal plane described by Scott et al. ^{st} and 3^{rd} quadrants than the torque space (^{nd} and 4^{th} quadrants in linear acceleration space (

The model was further extended to 3D reaching movements.

It has long been hypothesized that well-organized stereotypical movements are achieved by minimizing the cost (

A small number of previous studies have proposed a mechanism for how the cost of the motor effort is minimized in the brain on a trial-by-trial basis. Kitazawa

In contrast, recent studies have suggested that forgetting might be useful to minimize the motor effort

The present study further applied the “feedback-with-decay” algorithm to the sensorimotor transformation network, which includes M1 neurons. We initially used a linear neural network and theoretically derived the necessary conditions for convergence on the optimal state. Importantly, these conditions seem to be satisfied in the actual brain. First, the decay rate is known to be much smaller than the learning rate

The “feedback-with-decay” rule can be considered as biologically plausible in that it does not need to explicitly calculate the sum of the squared neural activity (total effort cost) by gathering activity information from a vast number of neurons. Since weight decay in each synapse could occur independently of other synapses, a global summation across all neurons would not be needed. Using a framework of weight decay, it would be possible for the CNS to minimize even the motor effort cost during movement of the whole body. One may argue that since we perceive tiredness, the brain must compute the total energetic cost (or motor effort cost); however, to the best of our knowledge, individual neurons that encode the total energetic cost have not been discovered. It is rather likely that such a physical quantity is represented by a large number of distributed neurons in the brain and this distributed information may be perceived as tiredness. Since it is unclear whether the total energetic cost could be readout from such distributed information, decay would be a more promising mechanism for minimizing motor efforts. Furthermore, our simulation results indicate that the formation of an optimal PD distribution pattern for M1 neurons was not necessarily accompanied with the realization of a nearly optimal muscle activation pattern (compare

Although we referred to the “feedback-with-decay” algorithm as biologically plausible, it should be noted that our simulation algorithm is not fully biologically plausible because it still depends on an artificial calculation (i.e., error back-propagation). Although it is well established that error information is available to the cerebellum

The important point of the present study is that we theoretically proved that the “feedback-with-decay” rule consistently leads the PDs of M1 neurons to converge at a distribution that is orthogonal to the MD distribution. Although Guigon et al.

Importantly, the non-linear model combined with the realistic musculoskeletal parameters can reproduce the non-uniform PD distribution of M1 neurons observed during various motor tasks. The origin of the PD bias has been a hotly debated topic in neurophysiology

Another interesting finding is that even the “feedback-only” rule predicts the skewed PD distribution of M1 neurons approximately if the two following conditions are satisfied: a large number of neurons participate in the task (condition #2) and the initial synaptic weight is considerably smaller than the pseudo-inverse matrix (^{T}^{T}^{−1}) (condition #4). This finding indicates that the PD bias itself is not direct evidence of the minimization of effort, as has been thought previously

According to our mathematical consideration, the weight decay rate must be substantially lower than the learning rate (see Supporting

The present scheme also implies that motor learning has two different time scales: a fast process associated with error correction and a slow process associated with optimizing efficiency through weight decay (

Due to its simplicity, our model provided clear insights into the role of weight decay on optimization; however, of course, it has several limitations. First, the model considered only corticospinal neurons, although M1 also includes inhibitory interneurons. However, it is noteworthy that our model could predict the PD distribution of M1 neurons recorded from non-human primates, suggesting that most of the neurons recorded in previous experiments were corticospinal neurons. Indeed, considering the large size of corticospinal pyramidal neurons, it is likely that the chance of recording these neurons is relatively high because stable isolation over an extended period of time is required in such experiments

Second, a uniform distribution was assumed for the neuron-muscle connectivity (

Thirdly, the model only considered static tasks (i.e., isometric force production) and an instantaneous ballistic task (i.e., the initial phase of the reaching movement). Such a single time point model is unrealistic for reaching movements in that it ignores the change of limb posture, posture-dependent changes in the muscle moment arms, multi-joint dynamics during motion, and the deceleration phase. This limitation prevents us from predicting the essential features of movement such as trajectory formation and online trajectory correction

First, we used a linear neural network to transform the desired torque (input layer) into the actual torque (output layer) through an intermediate layer that consisted of 1000 neurons (^{2}) from the input layer with a synaptic weight (_{i}^{2}) that could be modified with learning. The activation level (_{i}_{i}_{i}^{T}^{n}^{n}^{×2} is the synaptic weight matrix for all neurons, expressed as:_{i}^{2}) is determined by its activation level (_{i}_{i}^{2}): _{i}_{i} r_{i}^{2}) is expressed as the vector sum of the output of all neurons: ^{2×n} is the matrix of MDVs for all neurons, expressed as:^{Uniform8}^{Uniform8}

The network was trained to produce the appropriate output torque by randomly presenting 8 target torques (

In the feedback-only rule, the synaptic weight _{ij}_{e}_{e} = 1/2^{T}

The procedures in the feedback-with-noise rule were the same as in the feedback-only rule, except that SDN was added to the actuator activity and synaptic modification. The activation of each actuator was determined by:

In the feedback-with-decay rule, the synaptic weight _{ij}_{ij}^{−4}, much smaller than the learning rate (_{β} = 1/^{−4}) is much smaller than that of the slow process (4.0×10^{−3}) estimated by Smith et al.

The initial synaptic weights were set to random values as follows:_{ij}_{1}^{init}_{2}^{init}_{5}^{init}

To confirm the effectiveness of weight decay in a more realistic model, we also considered a neural network model with a muscle layer whose activity (_{i}_{1i}_{2i}… _{8i})^{T}_{i}_{i}^{In}_{1} _{2}…_{8})^{T}^{In}_{1}^{In} _{2}^{In}_{8}^{In}^{2×8} is a matrix that consists of the MDVs for all of the muscles.

Using realistic muscle data, we modeled a 2D upper limb that had 2 degrees of freedom (DOF; shoulder and elbow joints) with 26 muscle elements (_{i}_{i}_{i}_{,1}_{i}_{,2})^{T}^{In}^{2×8}. We defined the effect of the activation of each neuron on the output torque (^{In}_{i}^{In}_{1}_{2}…_{n}^{8×n}. To examine the effects of the initial conditions, the simulation was conducted 4 times with different sets of initial synaptic weights [

The network model can also be applied to the task of producing the linear acceleration of the fingertip (i.e., the initial phase of the reaching movement) by replacing the torque in ^{2×2} is the Jacobian matrix, ^{2×2} is the system inertia matrix of the two-joint system, and ^{2} is a joint angle vector that consists of the shoulder and elbow angles. To calculate the Jacobian and inertia matrices, we used morphological data from

We further extended the model to 3D reaching movements. We modeled a 3D upper limb with 4 DOF; (3 DOF for the shoulder and 1 DOF for the elbow) with 26 muscle elements (_{i}_{i}_{i}_{,1} _{i}_{,2} _{i}_{,3} _{i}_{,4})^{T}_{U}_{U}_{U}_{F}^{3×4} and the system inertia matrix ^{4×4} for the 3D limb model were calculated using previously described methods

For the 3D simulation, 14 equally spaced targets (^{−4}, respectively. To examine the effects of the initial conditions, the simulation was conducted 5 times with different sets of initial synaptic weights [

the probability of appearance was equal for all 14 targets;

the probability for targets #1 and #3 was 8/28 (1/28 for the other targets);

the probability for targets #2 and #4 was 8/28 (1/28 for the other targets);

the probability for targets #5 and #6 was 8/28 (1/28 for the other targets).

In total, we conducted 20 (5 initial weights×4 probability conditions) simulations.

To examine the significance of the bimodal distribution obtained from the simulation, we performed the Rayleigh test for uniformity against a bimodal alternative (

_{1} and w_{2} that fulfills the equation: w_{2}−w_{1} = 1. The color gradations indicate the error cost as a function of the synaptic weights w_{1} and w_{2}. The white dashed line indicates the minimum at which the error is zero. The circles indicate the contours of the sum of the squared values (i.e., w_{1}^{2}+w_{2}^{2}). Simulations were conducted with w_{1} = 0 and w_{2} = −2 as the initial values. In the feedback-only rule (

(TIF)

_{1} and w_{2} that fulfills the equation: w_{2}−w_{1} = 1 using the feedback-with-noise rule.

(TIF)

(TIF)

_{U}_{U}_{U}_{F}_{F}_{F}

(TIF)

(DOC)

_{s} and _{e} are the moment arms for shoulder flexion(+)/extension(−) and elbow flexion(+)/extension(−), respectively.

(DOC)

_{1}, _{2}, and _{3} are the shoulder joint moment arms for the _{U}_{U}_{U}_{4} is the elbow joint moment arm for the _{F}

(DOC)

(PDF)