Advertisement
  • Loading metrics

Hierarchical motor adaptations negotiate failures during force field learning

  • Tsuyoshi Ikegami ,

    Contributed equally to this work with: Tsuyoshi Ikegami, Gowrishankar Ganesh

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    ikegami244@gmail.com

    Affiliations Center for Information and Neural Networks (CiNet), National Institute of Information and Communications Technology (NICT), Osaka, Japan, Brain Information Communication Research Laboratory Group, ATR, Kyoto, Japan, Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan

  • Gowrishankar Ganesh ,

    Contributed equally to this work with: Tsuyoshi Ikegami, Gowrishankar Ganesh

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Software, Validation, Writing – original draft, Writing – review & editing

    Affiliations Center for Information and Neural Networks (CiNet), National Institute of Information and Communications Technology (NICT), Osaka, Japan, Brain Information Communication Research Laboratory Group, ATR, Kyoto, Japan, Centre National de la Recherche Scientifique (CNRS), Universite Montpellier (UM) Laboratoire d’Informatique, de Robotique et de Microelectronique de, Montpellier (LIRMM), Montpellier, France

  • Tricia L. Gibo,

    Roles Investigation, Methodology, Writing – review & editing

    Affiliations Brain Information Communication Research Laboratory Group, ATR, Kyoto, Japan, Emergo by UL, Utrecht, The Netherlands

  • Toshinori Yoshioka,

    Roles Investigation, Methodology, Software, Writing – review & editing

    Affiliation Brain Information Communication Research Laboratory Group, ATR, Kyoto, Japan

  • Rieko Osu,

    Roles Funding acquisition, Resources, Supervision, Writing – review & editing

    Affiliations Brain Information Communication Research Laboratory Group, ATR, Kyoto, Japan, Faculty of Human Sciences, Waseda University, Saitama, Japan

  • Mitsuo Kawato

    Roles Funding acquisition, Methodology, Resources, Supervision, Writing – review & editing

    Affiliation Brain Information Communication Research Laboratory Group, ATR, Kyoto, Japan

Hierarchical motor adaptations negotiate failures during force field learning

  • Tsuyoshi Ikegami, 
  • Gowrishankar Ganesh, 
  • Tricia L. Gibo, 
  • Toshinori Yoshioka, 
  • Rieko Osu, 
  • Mitsuo Kawato
PLOS
x

Abstract

Humans have the amazing ability to learn the dynamics of the body and environment to develop motor skills. Traditional motor studies using arm reaching paradigms have viewed this ability as the process of ‘internal model adaptation’. However, the behaviors have not been fully explored in the case when reaches fail to attain the intended target. Here we examined human reaching under two force fields types; one that induces failures (i.e., target errors), and the other that does not. Our results show the presence of a distinct failure-driven adaptation process that enables quick task success after failures, and before completion of internal model adaptation, but that can result in persistent changes to the undisturbed trajectory. These behaviors can be explained by considering a hierarchical interaction between internal model adaptation and the failure-driven adaptation of reach direction. Our findings suggest that movement failure is negotiated using hierarchical motor adaptations by humans.

Author summary

How do we improve actions after a movement failure? Although negotiating movement failures is obviously crucial, previous motor-control studies have predominantly examined human movement adaptations in the absence of failures, and it remains unclear how failures affect subsequent movement adaptations. Here we examined this issue by developing a novel force field adaptation task where the hand movement during an arm reaching is perturbed by novel forces that induce a large target error, that is a failure. Our experimental observation and computational modeling show that, in addition to the popular ‘internal model learning’ process of motor adaptations, humans also utilize a ‘failure-negotiating’ process, that enables them to quickly improve movements in the presence of failure, even at the expense of increased arm trajectory deflections, which are subsequently reduced gradually with training after the achievement of the task success. Our results suggest that a hierarchical interaction between these two processes is a key for humans to negotiate movement failures.

Introduction

Imagine you are practicing golf shots in a driving range and aiming to land the ball on the green with a pre-planned ball trajectory. When the ball goes along a different, unintended trajectory but it still lands on the green, you will almost automatically correct your next hitting action, by accounting for the error in the ball trajectory. However, the correction you make will be very different if the ball goes out of bounds of the green. In which case, you would not just make a large correction in the hitting action but also maybe even change your plan of the trajectory. Going out of bounds is considered a failure in golf, penalized by an extra shot, and the movement adaptation by humans in the presence of failure is intuitively very different from when a movement has achieved its target.

Failure-driven adaptations by humans have been extensively studied in decision making or cognitive control [1,2], while it has remained unclear how such distinct adaptations driven by failure affect human motor adaptation. Previous studies on motor adaptation have mainly focused on the internal model adaptation that is driven by sensory prediction error (SPE)–the difference between sensory feedback and sensory prediction of a movement [35], and/or motor command error [6]. In the last decade, however, there is mounting evidence that failure or target error (TE)–the difference between the sensory feedback of the movement endpoint and the target position–has a distinct, important contribution to motor adaptation [7,8]. The most popular TE-driven (or failure-driven) motor adaptation process is explicit strategy learning [7,9,10], which has been mostly examined during arm reaching adaptation to visuomotor rotations and often quantified by explicit reports of the reaching aiming point [9]. The explicit strategy learning is thought to modify motor performance to reduce TE, independently of SPE [7].

It however remains unclear what is the relation between the TE-driven motor adaptation and the SPE-driven motor adaptation (i.e., internal model adaptation). The interaction between the explicit strategy learning and the internal model adaptation is popularly explained by a two-state model of sensorimotor learning with different time scales for each state [11], where the two operate in a ‘flat’ (non-hierarchical) manner and the net adaptation is defined to be the sum of the two [9,12]. The fast component of the model has been often suggested to be linked to explicit strategy learning in visuomotor rotation tasks [9,10] as well as force field tasks [10,13,14]. On the other hand, recent studies have shown that the TE modulates the adaptation rate of the SPE-driven internal model adaptation [15] or savings [16]. This role of the TE as a modulator to the internal model adaptation may suggest a hierarchical interaction between the TE-driven and the SPE-driven motor adaptations.

Here we show that the two adaptation processes, in fact, interact hierarchically using a force adaptation paradigm with new TE-inducing force fields that perturbed the participant’s hand with large forces near the target (Fig 1B). The development of these new fields was crucial, as the force fields used in most reaching adaptation studies induce minimal TE or failure. For example, the popular velocity-dependent curl force-field (VDCF) [17,18] exerts the largest force perturbation on hand movements of participants in the middle of the reach and minimal perturbation near the target during reaching movements with a bell-shaped velocity profile [19]. The force field, thus, results in large lateral deviations (LDs) mid-reach in the early adaptation trials, but allows the participant to reach their target even after this large LD (see Fig 2A).

thumbnail
Fig 1.

Experiment and force fields: A) Participants made a reaching movement from a start point to a target point while holding a handle of a robot manipulandum. The direct vision of the participant’s hand was occluded by a table while they received visual feedback of their hand position during each trial by a cursor projected on the table. B) A very stiff two-dimensional spring, which was activated when the hand velocity decreased below a threshold of 20 mm/s, ensured that the participant could not make a second corrective movement to reach the target. C) The reaching task was performed in two force fields in Experiment-1 (VDCF and LIPF) and two force fields in Experiment-2 (PSPF and CPVF). The hand force profiles in these force fields are shown as shaded regions while assuming a straight minimum-jerk hand trajectory along x = 0. VDCF is a velocity-dependent force field, while LIPF and PSPF are position-dependent force fields. CPVF is a linear combination of VDCF and LIPF. Please refer to the methods for the mathematical definitions of the fields.

https://doi.org/10.1371/journal.pcbi.1008481.g001

thumbnail
Fig 2.

Trajectory adaptation in Experiment-1: (A, C) The hand trajectories of two representative participants and learning curves in VDCF (A) and LIPF (C) averaged across all participants. Note that the scales differ between x and y axes to clearly show trajectory changes along the x-axis. The light gray shades behind some trajectories represent a schematic image of the force field. The adaptation of the TE and LD are shown by traces with open circles and filled circles, respectively. The first 15 TE and LD values are plotted for every single trial, while the subsequent trials (indicated by thick gray lines at the bottom of the figure) are plotted for every five trials. The shaded gray areas around the lines represent standard errors. The light green zones represent the target width (radius: 7.5 mm). (B, D) The TEs and baseline-subtracted LDs in six trial epochs (1st, 3rd-5th, 136th-155th adaptation trials, and 1st, 3rd-5th, 131st-150th de-adaptation trials) in VDCF (B) and LIPF (D). Gray dots represent data from individual participants. The error bars indicate standard errors. The light green zone in the TE plots represents the target width.

https://doi.org/10.1371/journal.pcbi.1008481.g002

In our study with the novel TE-inducing force fields, we observed that TE-driven motor adaptation occurs faster than internal model adaptation. Second, and importantly, TE-driven motor adaptation can result in persistent after-effects that are distinct from after-effects after internal model adaptation. Third, these adaptive behaviors can be well explained by previous models of internal model adaptation only if they incorporate a hierarchical interaction between TE-driven adaptation of the kinematic plan and internal model adaptation. The relation between TE-driven adaptation and internal model adaptation is consistent with the traditional view of hierarchical motor planning of kinematics and dynamics [6,20].

Results

Experiment-1

In Experiment-1, thirty participants were asked to make arm reaching movements to a target 150 mm from the start position (Fig 1A) and adapt to either of two force fields (Fig 1C): the popular velocity-dependent curl field (VDCF) that does not induce TEs, and the novel and TE-inducing linearly increasing position-dependent (orthogonal) field (LIPF) (see Methods for details). The adaptation phase (155 trials) was followed by the de-adaptation phase (150 trials), where the participants performed the same task in the null field, like the baseline session. We randomly assigned the participants to one of the two force fields (n = 15 for each). Their movements were quantified by two variables: TE and LD. The TE was defined as the x-deviation of the endpoint hand position from the target, and the LD was defined as the x-deviations of the hand from the mid-point (y = 75 mm) of the straight line connecting the start and the target (Fig 1A and 1B, and see Methods).

TE changes the trajectory adaptation pattern.

Fig 2 shows the time development of hand trajectories, TE (open circle), and LD (filled circle) in the two force fields and subsequent null field. To show immediate and later effects of the initial TE on the adaptation and de-adaptation phases, we analyzed the data in six trial epochs: 1st, 3rd-5th, 136th-155th (i.e., last 20) adaptation trials and the 1st, 3rd-5th, 131st-150th (i.e., last 20) de-adaptation trials (Fig 2B and 2D).

In the VDCF, the trajectory adaptation pattern was similar to those reported in previous studies. The force field perturbed the participants’ hand trajectories considerably in the first adaptation trial (Fig 2A), but their hands still could reach the target as we expected. After the adaptation phase, the participants could fully compensate for the perturbation, and their trajectories became straighter, curving towards the opposite direction by the 155th adaptation trial. In the first de-adaptation trial, their hand trajectories exhibited a large after-effect, deviating towards the opposite direction to the force field. By the end of the de-adaptation phase, their trajectories returned to the straight baseline, or null, trajectories (see 148th de-adaptation trial). These results were consistent with what has been observed in previous studies [18,21]. The across-participant average adaptation of the TE (open circle) and LD (filled circle) are shown in the bottom panels of Fig 2A. A large LD induced at the beginning of the adaptation and de-adaptation phases quickly decreased to within the target size (radius = 7.5 mm, light green zone) within the first 10 adaptation and de-adaptation trials, respectively. Importantly, TEs remained relatively small—around or within the target from the very first adaptation trial and through the following adaptation and de-adaptation phases. In fact, the magnitude of TE was not significantly larger than the target radius in the first adaptation trial (t(14) = 0.284, p = 0.780) and the first de-adaptation trials (t(14) = 0.131, p = 0.897).

On the other hand, the TE-inducing LIPF showed a dramatically different adaptation pattern from the VDCF. In the LIPF, the participants’ hand trajectories in the first adaptation trial (Fig 2 C) were perturbed the most around the target, resulting in a large TE (across-participants average of TE in 1st trial was 112.6 ± 38.0 (mean ± s.d.) mm) that was significantly larger than the target (t(14) = 10.700, p = 4.016×10−8). In the subsequent adaptation trials (see 4th adaptation trial in Fig 2C), the participant’s hand trajectories jumped opposite to the force direction, which ensures that the target is reached, even with a curved trajectory. It is important to note that the magnitude of the LD increases (between 2nd and 7th adaptation trials), before it gradually decays after the 7th adaptation trial. Furthermore, the decay was observed to be opposite in sign to that in VDCF. That is, while the LD in the VDCF decays from an initial negative value (i.e., from ‘–x’ towards zero), the decay in the LIPF is from a positive deviation (i.e., from ‘+x’ towards zero), even though the LIPF also pushes the hand in the same direction as the VDCF field (i.e., towards ‘–x’). Consequently, the decays of the TE and LD are of the same sign in the VDCF, but opposite signs in the LIPF.

The trajectory change in the de-adaptation phase (1st, 4th, and 147th de-adaptation trials in Fig 2. C) was almost a mirror image of that in the adaptation phase. A distinctly large TE (of 44.3 ± 27.7 mm) was induced in the first de-adaptation trial, which was again significantly larger than the target (t(14) = 5.140 p = 1.503×10−4), which monotonically reduced to within the target size by the 10th trial. In contrast, the LD did not show a monotonic decrease. Unlike in the VDCF, the magnitude of the LD first increased and then decreased. And, again in the de-adaptation phase, we observed that the decays were of opposite sign changes in TE and LD.

To quantify the trajectory adaptation pattern of each group, we performed one-way ANOVAs on the TE and LD values across the trial epochs. The VDCF group showed a significant main effect in LD (F2.546, 35.649 = 175.179, p = 3.165×10−5, = 0.926) but not TE (F2.152, 30.134 = 2.284, p = 0.116, = 0.140). Post-hoc Tukey’s tests confirmed that the magnitude of LD monotonically changed during the adaptation (1st vs 136th-155th: p<0.001) and de-adaptation (1st vs 131st-150th: p<0.001) phases.

The LIPF group showed a significant main effect in both TE (F1.686, 23.600 = 84.204, p = 6.404×10−11, = 0.857) and LD (F2.601, 36.412 = 73.312, p = 8.660×10−15, = 0.840). The magnitude of TE monotonically decreased during the adaptation (1st vs 136th-155th: p<0.001) and de-adaptation (1st vs 131st-150th: p<0.001) phases. In contrast, the LD showed a non-monotonic change during the adaptation and de-adaptation phases. The LD increased from the 1st to the 3rd-5th adaptation trials (p<0.001) and then decreased from the 3rd-5th trials to the 136th-155th adaptation trials (p<0.001). Similarly, the LD decreased from the 1st to 3rd-5th de-adaptation trials (p<0.001), and then increased from the 3rd-5th to 131st-150th de-adaptation trials (p = 0.008).

The appearance of a new, curved null trajectory after de-adaptation of LIPF.

Furthermore, we observed an intriguing phenomenon in the de-adaptation phase of the LIPF. In the case of the VDCF, upon returning to the null field after the adaptation phase, the participants readily lost their adapted trajectories within the first 10 de-adaptation (null) trials (Fig 2A); their trajectories returned to their original null trajectories (observed in the baseline session) as previously reported [21,22]. This was, however, not the case after the LIPF (see 150th de-adaptation trial in Fig 2C). After the de-adaptation phase, the participants’ trajectories remained marginally, yet consistently, deviated from their original null trajectories, even after as many as 150 null trials (~20 min). Fig 3A compares the participant-averaged null trajectories before (blue traces) and after (red traces) exposure to the VDCF or LIPF (first and second plots from left). The LD in the null trajectory showed a significant difference between before and after exposure to the LIFP (t(14) = 4.224, p = 8.494×10−4), but not the VDCF (t(14) = 0.774, p = 0.452) (Fig 3B).

thumbnail
Fig 3.

Null trajectories before and after adaptation in the four force fields in Experiment-1 (VDCF and LIPF) and Experiment-2 (PSPF and CPVF). A) The null trajectories averaged across the last 20 trials were compared between the baseline (cyan lines) and de-adaptation (magenta lines) phases. The color shades indicate standard errors. Note that the scales differ between x and y axes to clearly show trajectory differences along the x-axis. B) The baseline-subtracted LDs in the trial epoch from the last 20 (131st-150th) de-adaptation trials in the four force fields. Gray dots represent data from individual participants. The error bars indicate standard errors. * indicates p < 0.05.

https://doi.org/10.1371/journal.pcbi.1008481.g003

Crucially, note that the deviation of the new null trajectory was observed to be in the direction in which the force field perturbed the hand and not in the direction opposite to the force field, as would be generally expected after exposure to the VDCF. These observations suggest that the new null trajectory may be not simply an after-effect due to a slow de-adaptation to the force field but a consequence of the TEs induced in the first few null (de-adaptation) trials after exposure to the LIPF. To further investigate the cause of the appearance of the new null trajectory, we next conducted two control experiments.

Experiment-2

In Experiment-2, we considered the possibility that the new null trajectory was not a consequence of the TE and was, rather, induced due to the LIPF being a position-dependent field. To negate this possibility, we examined trajectory adaptation by fifteen participants in the positively skewed position-dependent field (PSPF) (Fig 1B), which is a position-dependent force field that does not induce TEs.

We observed that the magnitude of TE in the first adaptation (t(14) = 0.261 p = 0.798) and de-adaptation trials (t(14) = 0.097 p = 0.924) in PSPF was not significantly larger than the target radius, while the LD showed a monotonic change through the adaptation and de-adaptation phases (see S1 Fig and S1 Text). Importantly, the null trajectory in the de-adaptation phase of the PSPF returned to the baseline null trajectory (t(14) = 0.659, p = 0.520) (Fig 3A and 3B). These observations were similar to the behaviors observed during exposure to the VDCF.

Next, to ensure that the new null trajectory is also observed in other TE-inducing force fields than the LIPF, we examined the trajectory adaptation in the position and velocity-dependent field (CPVF) (Fig 1C). We observed that similar to the LIPF, the CPVF induces a large TE, both in the first adaptation trial (73.2 ± 50.0 (mean ± s.d.) mm, t(13) = 4.905, p = 2.874×10−4), as well as the first de-adaptation trial (26.0 ± 22.8 mm, t(12) = 5.211 p = 2.178×10−4). The TEs monotonically reduced until the participant’s hand could reach the target. In contrast, as with the LIPF, the LD clearly decreased only after substantial reductions in the TE during the adaptation and de-adaptation phases (see S2 Fig and S1 Text). Crucially, the participants exhibited a new hand trajectory that was significantly different from their initial null trajectory (t(13) = 3.386, p = 0.0049) even after 150 trials in the de-adaptation phase (Fig 3A and 3B). This result provides further support for the possibility that the new null trajectory is a consequence of the TEs induced at the beginning of the de-adaptation phase.

Experiment-3

Finally, to concretely establish the TEs (at the beginning of the de-adaptation phase) as the cause of the new null trajectory, in Experiment-3 we examined the hand trajectories when the TEs were eliminated in the de-adaptation phase of LIPF (Experiment-1). Thirty participants participated in Experiment-3. Half (15) of these participants had previously participated in Experiment-1. Similar to Experiment 1, these participants trained in the LIPF first, followed by the de-adaptation phase. However, in the de-adaptation phase of Experiment-3, they made reaches in the null field in the presence of a partial error clamp (PEC). This experiment condition was referred to as LIPF-PEC condition, while the LIPF condition of Experiment-1 (the LIPF followed by the Null) was referred to as LIPF-Null condition. The PEC was implemented as a strong spring (see Methods for details) that acted over the second half of their movement (y > 75 mm) and pulled the participant’s hand to the target along the x-axis (Fig 4A, also see Methods). Note that the first half of the movement (y ≤ 75 mm), where the LD is measured, remained unaffected by the PEC. The other half of participants, who were newly recruited, experienced the LIPF-PEC first and then the LIPF-Null conditions to cancel out the order effects of these two conditions. We compared the LIPF-PEC condition (Fig 4B, right) with the LIPF-Null condition (Fig 4B, left). As the half of participants was also used in Experiment-1, statistical significance for the data of Experiment-3 was tested with Bonferroni multiple comparison.

thumbnail
Fig 4. Effect of attenuation of TE on the de-adaptation trajectory.

(A) After exposure to LIPF, the participants in the LIPF-PEC condition of Experiment-3 were exposed to the PEC where a force channel was applied over the second half of the reaching movement to attenuate TEs. (B) The hand trajectories and learning curves of both TE (open circle) and LD (filled circle) are compared between the LIPF-Null (left panel) and LIPF-PEC conditions (right panel). (C) The TE in the first de-adaptation trial (left panel) and the baseline-subtracted LD averaged across the last 20 (131st-150th) de-adaptation trials (right panel) were compared between the two conditions. Gray dots represent data from individual participants. The error bars indicate standard errors. * P < 0.05.

https://doi.org/10.1371/journal.pcbi.1008481.g004

Although the trajectory adaptation to the LIPF was similar between the LIPF-Null (left panel in Fig 4B) and LIPF-PEC conditions (right panel in Fig 4B), a stark difference was observed in the de-adaptation phase in presence of the PEC. As expected, the TE in the first PEC trial was substantially attenuated, compared to the first trial in a normal Null field (left panel in Fig 4C; PEC: 8.6 ± 1.9 (mean ± s.d.) mm, Null: 39.9 ± 26.5 mm; t(29) = 6.550, pcorrected = 7.133×10−7). On the other hand, the LD in the first de-adaptation trial did not differ between the PEC field and the Null field (t(29) = 0.732, puncorrected = 0.470). However, the difference in LD appeared after the 1st de-adaptation trial; while the LD in the LIPF-Null condition showed large jumps from ‘+x’ to ‘-x’, before decaying to the new null trajectory (similar to Experiment-1), the LD in the LIPF-PEC condition was similar to the VDCF condition. In the presence of the PEC, the LD monotonically converged from ‘+x’ through the de-adaptation phase. More importantly, the magnitude of the LD in the last twenty de-adaptation trials in the LIPF-PEC condition was significantly smaller than in the LIPF-Null condition (t(29) = 2.851, pcorrected = 0.016; right panel in Fig 4C). Furthermore, the participants’ hand trajectories returned to their initial null trajectories on the application of the PECs (t(29) = 0.283, puncorrected = 0.779). Overall, the behaviors in the PEC were observed to be the same as in the no-TE-inducing force fields, specifically the VDCF and PSPF (compare Fig 4B’s right panel with Fig 2A). Moreover, when we analyzed only the second half of participants who participated only in Experiment-3 (no need of multiple comparison), we confirmed the same results. The TE in the first de-adaptation trial was substantially smaller in the LIPF-PEC condition than the LIPF-Null condition (t(14) = 4.193, p = 9.020×10−4). The LD in the last twenty de-adaptation trials was significantly smaller in the LIPF-PEC condition than the LIPF-Null condition (t(14) = 2.183, p = 0.047), and the hand trajectories in the PEC returned to the original null trajectories (t(14) = 0.637, p = 0.534). Furthermore, the results of Experiment-3 suggest that muscle fatigue is unlikely to account for the formation of the new null trajectory. This is because we observe new null trajectories in the LIPF-Null but not in the LIPF-PEC conditions, even though the participants train on the same LIPF before performing the de-adaptation phase in these conditions. Overall, these results strongly suggest that the TEs after exposure to TE-inducing force fields caused the new null trajectories observed in Experiment-1 and -2.

Hierarchy and model simulation

Our results show that in the presence of failure (TE > target size), the evolution of the trajectories is very different from when there are no TEs (compare Fig 2A and 2C). The reduction of TE is consistently given priority over the reduction of LD (Fig 2C), with the TE decreasing monotonically, even at the cost of a temporary increase of LD over several trials. Finally, adaptation in the presence of failure can induce changes in the undisturbed (null) trajectories (Fig 3).

First, these observations suggest the presence of a TE-driven adaptation process, in addition to the SPE-driven internal model adaptation. Furthermore, the distinct adaptation of the TE and LD in the LIPF, one of which is monotonic while the other not (Fig 2C), led us to hypothesize a hierarchical interaction between the two processes. To evaluate this hypothesis, we simulated the trajectory adaptation in the VDCF, LIPF-Null, and LIPF-PEC using two sensorimotor adaptation models that consider only the internal model adaptation, with and without the addition of a hierarchical TE-driven adaptation process.

First, we started with the ‘flat’ optimal feedback control model (or the flat OFC model), proposed by Izawa et al. [23] to explain trajectory adaptation in a velocity-dependent force field by combining the internal model learning of the learned force field and the optimal feedback control [24]. Second, the ‘flat’ V-shaped model (or flat VS model) proposed by Franklin et al. [25], which utilized a different algorithm, similar to feedback error learning [6] where muscle activation changes across trials are determined by a V-shaped learning function under the assumption of a pre-planned desired trajectory. We refer to both these models using the prefix ‘flat’ as both models consider a single SPE-driven internal model adaptation process to explain motor adaptations. We will show that these models can explain our experimental observations by appending a ‘hierarchical’ TE-driven adaptation process in their current structure. Please see Methods for details of implementation.

Fig 5 shows that simulations of the VDCF, LIPF-Null, and LIPF-PEC adaptations by the flat OFC and flat VS models. Although the flat OFC model (Fig 5A) and the flat VS model (Fig 5B) qualitatively reproduced the trajectory adaptation in the VDCF well, they were unable to reproduce both the non-monotonic change in LD and the persistent curved null trajectory observed in the LIPF-Null and LIPF-PEC (Fig 5C and 5D).

thumbnail
Fig 5.

Flat models cannot reproduce LIPF and PEC behaviors: Simulation for trajectory adaptation in the VDCF (A, B), LIPF-Null (C, D), and LIPF-PEC (E, F) conditions, represented by TE (open circle) and LD (filled circle) by the flat OFC (upper panels) and VS models (lower panels). The flat learning models (only internal model adaptation) were unable to reproduce either the non-monotonic change in LD (C, D, E, F) or the curved null trajectory with a persistent deviation after exposure to the LIPF(C, D).

https://doi.org/10.1371/journal.pcbi.1008481.g005

Next, we introduced an additional TE-driven adaptation process to these models. We assumed that the adaptation process represents a modification of the kinematic plan, when there is a failure (i.e., a TE > target size), and then added the kinematic plan adaptation process on the top of the flat learning models (Fig 6A). We thus refer to these two models as the ‘hierarchical’ OFC model and the ‘hierarchical’ VS model, respectively. The kinematic plan adaptation process was assumed to be activated only in the presence of failure and modulated by TE so that the trajectory is adjusted to change in the opposite direction to the TE. In the absence of failure (i.e., TE < target size), the kinematic plan subtly decays across trials to the original plan (i.e., the straight direction towards the target). We assume that the decay stops when the motor cost of the generated reaching goes below a small value of threshold (see Methods for details of implementation). This assumption was done to reproduce the persistent curved null trajectory.

thumbnail
Fig 6. Hierarchical motor adaptation model.

(A) Schematic diagram of the model. The model consists of two adaptation components: the kinematic plan adaptation (magenta box) as a higher component, driven by TE, and the internal model adaptation (light blue box) as a lower component, driven by SPE. In the presence of failure (i.e., TE > target size), the kinematic plan adaptation process becomes active and modifies the planned direction of the hand motion. When the task is successful, the planned direction slowly decays to the original movement direction. (B) The planned direction of the hand motion is implemented as a directional bias (magenta arrow) in the hierarchical OFC model and a desired trajectory (magenta line) in the hierarchical VS model (see Methods for details).

https://doi.org/10.1371/journal.pcbi.1008481.g006

In the hierarchical OFC model, this process was implemented by a direction bias [26] (Fig 6B), which was incorporated into the cost function within the flat OFC model (see Methods for details). In the hierarchical VS model, the initial direction of the desired trajectory (Fig 6B) was modified in the same way as the hierarchical OFC model (see Methods). By including this TE-driven adaptation process, both models (Fig 7C, 7D, 7E and 7F) could explain all the features of the trajectory adaptation in LIPF-Null and LIPF-PEC, including the non-monotonic change of the LD during the adaptation phase, and the appearance of the new null trajectory after de-adaptation in the LIPF-Null or disappearance of that in the LIPF-PEC. In the absence of failure, as in VDCF, both models predict the same results as their flat counterparts (Fig 5A and 5B).

thumbnail
Fig 7.

Hierarchical model’s simulation for trajectory adaptation in the VDCF (A, B), LIPF-Null (C, D), and LIPF-PEC (E, F) conditions, represented by TE (open circle) and LD (filled circle) by the hierarchical OFC (upper panels) and VS models (lower panels). The simulated hand trajectories were shown at the top of each panel. The hierarchical learning models (kinematic plan adaptation and internal model adaptation) successfully reproduced the behaviors in all the three conditions.

https://doi.org/10.1371/journal.pcbi.1008481.g007

Discussion

We examined the motor adaptation of arm reaching trajectories in force fields that induce failure (TE > target size) at the beginning of the adaptation and de-adaptation phases. First, our results showed that the human motor learning system puts a higher priority on the reduction of TE than LD. In the presence of failure, the LDs did not follow a typical monotonic decrease as reported in previous studies [21,22,27,28]. TE is reduced first, even at the expense of an increased LD (Fig 2C). A monotonic decrease in LD took place only after the TE was reduced to around the target size. Second, the presence of failure in the de-adaptation phase caused the appearance of a new null trajectory that was distinct from the null trajectory observed in the baseline period and persisted even after 150 de-adaptation trials. These observations were successfully reproduced by the hierarchical motor adaptation models that combine a TE-driven kinematic plan adaptation with the internal model adaptation.

The prioritized reduction in TE over LD (Fig 2C and 2D) cannot be explained only by internal model adaptation even when considering multiple time scale adaptations, such as a two-state model [1012], because these models predict similar monotonic changes in both TE and LD (like Fig 5). It is important to note that this is also the case when considering the spatiotemporal difference in the error information. If the errors early in a trajectory are less important than those at the end to update the internal model of the force field, the difference may affect the adaptation rate (i.e. TE may lead to a faster internal model adaptation) but still not change the adaptation pattern to which the internal model adaptation leads (i.e. monotonic decay of the trajectory). In contrast, the non-monotonic trajectory changes in the presence of TEs suggests the presence of an additional TE-driven kinematic plan adaptation. In our hierarchical motor adaptation models (Fig 6), the kinematic plan adaptation changes the reaching direction in the opposite direction of the TE, which enables a quick reduction in TE, even when it sometimes leads to an increase in LD (Fig 7C, 7D, 7E and 7F). After the TE reduction, we assume that the kinematic plan slowly returns towards the original movement direction (i.e., towards the target). The hierarchical addition of this TE or failure driven process enables the models to explain the TE and LD adaptation processes both in no-TE-inducing force-fields as well as TE-inducing force fields.

The appearance of the new null trajectory in the de-adaptation phase can be also explained by the hierarchical dominance of kinematic plan adaptation over internal model adaptation. In our hierarchical models, we assumed that after the motor cost of arm reaching falls below a small threshold value, the decay of the kinematic plan toward the baseline plan stops. This assumption could reproduce the persistent curved null trajectory after de-adaptation in the presence of failure. The models thus suggest that the TE-driven kinematic plan adaptation may determine the steady-state null trajectory to which the internal model adaptation converges. This possibility is strongly supported by Experiment-3 (Fig 4) where the suppression of TE enabled the participants to converge back to their baseline null trajectory. This observation was also successfully reproduced by the hierarchical learning models (Fig 7E and 7F). Our assumption, that the TE-driven kinematic plan adaptation is also affected by the motor cost, is similar to the idea that a desired trajectory of movement may be modified according to the level of interaction force with the environment [29]. It has however not yet been empirically examined and remains an interesting question for future studies.

Motor learning processes like motor memory [3032] or use-dependent learning [33] make one’s movement similar to the last performed movement. Operant reinforcement learning [34] causes people to select movements for which the task had previously been successfully achieved. These processes may be seen as likely candidates to explain the persistent curved trajectories. However, these processes alone cannot explain why the persistent curved null trajectories do not appear in the no-TE-inducing force fields (VDCF or PSPF) (Fig 3B) as well as during PEC in Experiment-3 (Fig 4B), in which the participants successfully reached the target with curved null trajectories in the first de-adaptation trials. Our results thus suggest that even if motor memory, use-dependent learning, or operant reinforcement learning is indeed active during the force field adaptation, unlike kinematic plan adaptation, they do not hierarchically interact with internal model adaptation but instead work in a non-hierarchical manner. Likewise, other possible causes like perceptual bias [35,36] or perceptual recalibration [37] of the hand position also can not well explain why the persistent curved null trajectories appear only in the TE-inducing but not the non-TE-inducing force fields. If our model includes these learning processes or perceptual adaptation processes, it may be able to better explain the behavior. We however note that the main purpose of our model simulation is to explain the necessity of an additional TE-driven adaptation process hierarchically interacting with the internal model adaptation, rather than develop a new model. For this purpose, we chose the two most popular learning models in the current literature and demonstrated the effect of adding the additional TE driven process.

A priori, the new null trajectory in the de-adaptation phase shown by us is different from a persistent retention of learned movements that has been recently reported to occur after some reinforcement period where only binary success or failure feedback is provided [3841]. There, the retention was measured in no feedback periods subsequent to reaches, in which movement-related feedback was not available, and then when the feedback was available, a typical washout process took place with the movement quickly returning to the baseline level [38]. In contrast, in our study the new null trajectory persists for at least ⁓20 min of the de-adaptation period even when movement-feedback is available and without a reinforcement period. Future work is needed to determine how long the new null trajectory persists or whether it decays very slowly.

Recent studies have identified the presence of distinct explicit and implicit components of adaptation to novel visuomotor rotations [7,9,10]. The explicit components, called explicit strategy learning, have been proposed to be sensitive to task performance or TE, and faster than implicit components represented by internal model adaptation. We believe the TE-driven adaptation process we observed here may (at least partially) be an explicit strategy learning, as it was active only in the presence of failures and fast [10,14] but insensitive to LD (i.e. SPE). However, the key difference between this TE-driven adaptation and the explicit strategy learning previously identified lies in the way the two processes interact with the internal model adaptation. Previous visuomotor rotation studies have often utilized a two-state model to explain the interaction between the explicit strategy adaptation and internal model adaptation [10,42] by assuming that these two adaptation processes interact in a non-hierarchical manner where the net reach trajectory is defined to be the sum of the two. However, in the case of visuomotor rotation tasks, the parameter to be learned by the two adaptation processes is the same–the rotation angle (or its equivalent). In fact, previous force-field studies have similarly looked at the adaptation of a single parameter–the trajectory (quantified by its curvature, deviation, or encompassed area relative to the straight line). The adaptation of a single parameter is well explained by ‘flat’ models, including the “non-hierarchical” two-state model. On the other hand, this is not the case in our force field task, where the two adaptation processes represent changes in distinct parameters (the target and trajectory). The net adaptation behavior in our experiment cannot be explained by the flat models, including the two-state model, in its current formulation. Rather, the TE-driven adaptation and the internal model adaptation we observe here seem to be more consistent with the traditional view of hierarchical motor planning of kinematics and dynamics [6,20].

Our hierarchical models are different from the Adaptation Modulation model, a hierarchical motor adaptation model proposed by Kim et al. [15] that could explain the interaction of TE-driven adaptation and SPE-driven adaptation in their visuomotor rotation paradigm. The Adaptation Modulation model increases the adaptation rate of the SPE-driven adaptation process in the presence of failure (TE > target size). In the end, as with the two-state model, this model also considers only adaptation of the internal model (i.e. novel visuomotor ration) although it is modulated by the presence of TE. Thus, the TE-driven process of the Adaptation Modulation model hierarchically determines a temporal feature of the SPE-driven adaptation (i.e., how fast arm trajectories adapts to the novel environment) but not a spatial feature as in our models (i.e., where the adapted trajectories converges). Accordingly, the Adaptation Modulation model can explain our observations in the non-TE-inducing fields but not those in the TE-inducing fields (i.e. the presence of non-monotonic trajectory change and new null trajectory). Additionally, this is also true for two other models proposed by Kim et al. [15]: the Movement Reinforcement and Dual Error models. Both models implement an interaction of SPE-driven and TE-driven processes, but again consider the adaptation of internal model alone. On the other hand, our model may partially explain their results. Specifically, the TE-driven kinematic plan (by limiting the range of the plan change) can explain the facilitated adaptation observed in Kim et al. [15] although some constraints are necessary. However, as the TE-driven process in Kim et al. [15] modulates adaptive behavior in a completely implicit manner while our TE-driven process may, we believe, work in an explicit manner, these two may be distinct in nature.

Studies have regularly found hierarchical behaviors during cognitive learning and decision making in humans. The brain activations during these hierarchical behaviors have been well explained by hierarchical reinforcement learning (HRL) algorithms [4348]. The typical role of the higher component in a HRL system is to select a task-goal-oriented sub-goal or option, while the lower component typically selects an action to achieve this goal or sub-goal [44,4951]. This structure is very similar to the hierarchical motor learning models we suggest here. However, while the previous theoretical and imaging studies have exhibited a hierarchy at the level of cognitive learning in low degrees-of-freedom tasks, here our study suggests the presence of similar hierarchical structures for solving large degrees-of-freedom motor learning problems. The higher components active during cognitive learning have been linked to neural systems in the dorsolateral striatum, the dorsolateral prefrontal cortex, the supplementary motor area, the pre-supplementary motor area, and the premotor cortex [44]. On the other hand, the lower components have been related to the ventral striatum and the orbitofrontal cortex that has strong connections to both the ventral striatum and the dorsolateral prefrontal cortex [44]. Interestingly most of these areas have been observed to be active during motor learning of point-to-point arm or finger movements as well [5255], suggesting the cognitive learning processes and the hierarchical motor learning may process as subsets of a common HRL structure. However, further studies are required to clarify this speculation by concretely examining the sharing of neural structures between the two processes.

Before the conclusion, we note two limitations of this study. First, while we manipulate the presence or absence of TE across the force fields, the current experimental design could not control several movement features like the velocity profile, stiffness profile [56], online feedback gain [57], posture at the final position [58] or adaptive movement changes [59,60]. Although we believe it is unlikely that any of these factors alone can consistently explain our two key observations in the TE-inducing force fields: the non-monotonic trajectory change and the new null trajectory, they may partially contribute to the formation of our observations. For example, one possibility is that a change in feedback gain induced by a large TE may contribute to shape the new null trajectory, because feedback control has been suggested to share the internal model used for ‘feedforward’ control [6163]. Another possibility is that the faster reduction of TE than LD may be boosted by adaptive control that involves online update of the control policy within individual movements [59,60]. Adaptive control may update not only the control policy but also the kinematic plan in the presence of TE. If the update rate to the kinematic plan is greater than that to the control policy, this may result in different trial-by-trial adaptation of TE and LD with faster and slower time scales, respectively. Future studies are needed to examine these possibilities.

Second, the current experiment design cannot determine whether the TE-driven kinematic plan adaptation is an implicit process to automatically compensate for TE or an explicit process to intentionally change the strategy or the initial reach direction, although we believe the latter. One promising way to address this question may be to manipulate the participants’ psychological sensitivity to TE of the same movements as employed by Kim et al. [15]. Changing the target size or monetary reward for task success, but with other motor factors being kept constant, would be useful to examine whether or not the TE-induced adaptative behaviors observed in our study are explicitly modulated.

The failure (i.e., TE) driven adaptation of the kinematic plan leads to large and fast movement changes that are arguably costly in terms of control and energy [24,64]. It is, therefore, possible that in our daily lives, to reduce the control cost, kinematic plan adaptation remains inactive during the performance of most movements, as they are overlearned and rarely lead to failure. This plan adaptation is likely activated only when there is a (probably unexpected) failure. When a failure is experienced, the kinematic plan adaptation process helps the brain to quickly acquire success or reward, even at the expense of large high energy movement changes, after which it is again left to the internal model adaptation to optimize the movement relative to this new movement plan. Furthermore, task success or failure definitively depends on task requirements. In our study, as TE determines whether the task is successful or not, the participants prioritized TE over LD. However, if participants were instructed that the task goal is to make a reaching trajectory with a certain magnitude of LD, they would prioritize LD over TE. Moreover, when the failure is indicated by a binary (success or failure) feedback but not a signed error feedback like TE, LD may be more prioritized as suggested in a previous study [65]. Importantly, whatever the task goal or the feedback type is, our results suggest that the presence of failure may activate the kinematic plan adaptation to quickly achieve the goal. In conclusion, our study provides behavioral evidence to exhibit that human motor learning is shaped by the hierarchical interactions between the two learning processes; a higher kinematic plan adaptation driven by failure, and a lower internal model adaptation. This hierarchical motor adaptation structure may allow the brain to negotiate unexpected behavioral failures in an ever-changing and diverse environment around us.

Methods

Ethics statement

All experiments involved human participants and were approved by both the ethics committees of Advanced Telecommunication Research Institute (approval numbers: 15–722, 16–722) and National Institute of Information and Communications Technology. All participants signed an institutionally approved consent form.

Participants

A total of seventy-five neurologically normal volunteers (fourteen females and sixty-one males; age 22.70 ± 2.06, mean ± s.d.) participated in the experiments. All participants were right-handed as assessed by the Edinburgh Handedness Inventory [66]. All participants were naïve to the purpose of the experiments. No statistical methods were used to determine sample sizes although the sample sizes used in this study were similar to those in previous studies using similar reaching tasks [9,14,15,23,42].

Apparatus

The participants sat on an adjustable chair while using their right hand to grasp a robotic handle of the twin visuomotor and haptic interface system (TVINS) used to generate the environmental dynamics [67]. Their forearm was secured to a support beam in the horizontal plane and the beam was coupled to the handle. Since the TVINS has two parallel-link direct drive air magnet floating manipulandums, we performed the experiments with two participants at a time. Each manipulandum was powered by two DC direct-drive motors controlled at 2,000 Hz and the participants’ hand position and velocity were measured using optical joint position sensors (4800,000 pulses/rev). The handle was supported by a frictionless air magnet floating mechanism.

A projector was used to display the position of the handle with an open circle cursor (diameter 4 mm) on a horizontal screen board placed above the participant’s arm. The screen board prevented the participants from directly seeing their arm and handle. The participants controlled the cursor representing the hand position by making forward reaching movements (the details will be shown in the next section) from a start circle (10 mm diameter) to a target circle (15 mm diameter), which were displayed on the screen throughout all of the experiments. The start circle was located approximately 350 mm in front of the shoulder joint, and the target was 150 mm away from it.

Task

The participants were instructed to move the cursor from the start circle to the target circle in a period of 400 ± 50 ms. No instructions were given about the trajectory of reaching movement. Each movement was initiated by audio beeps. Participants were instructed to begin movement on the second beep, 1 s after the first beep. The second beep lasted for 400 ms and could be used as a reference to the instructed movement duration. The cursor was visible only during each trial. After each trial, the participants were provided information about their movement duration and final hand position. Movement duration was defined as the period between the time the cursor exits the start circle and enters the target circle. Participants were provided information about the movement duration, given as “SHORT”, “LONG” or “OK”. The final hand position was defined as the position at the moment when the hand velocity fell below 20 mm/s. If the final hand position was within the target circle, the inside of the circle turned blue. After each trial, a third beep 3s after the first beep indicated the termination of the trial and the TVINS brought the participant’s hand back to the start circle, and the next trial started after a period of 1 s. The inter trial-interval was 8 s.

Force fields

This study used four different force fields: Velocity-dependent curl field (VDCF), Linearly increasing position-dependent (orthogonal) field (LIPF), Positive skew position-dependent (orthogonal) field (PSPF), and Combination of position- and velocity-dependent field (CPVF). There are two TE-inducing force fields (LIPF and CPVF) and two no-TE-inducing force fields (VDCF and PSPF). They are illustrated in Fig 1C and computed using the following equations.

Where (Fx, Fy)T represents a force in Newtons exerted on the hand, (x, y) is the hand position relative to the center of the start circle in meters, is the hand velocity in meter per second, B1 is 14 Ns/m, K1 and K2 are 60 and 20868 N/m, respectively.

Importantly, the hand motion is momentarily constrained to the final hand position where the velocity fell below a low threshold of 20 mm/s by applying a strong stiff two-dimensional spring force (500 N/m) and damper (50 Ns/m). The constraint force is active until the trial ends (lasting for around 1600 ms). This was designed such that participants did not need to continue resisting large force at the movement end (as in LIPF and CPVF) and it prevents them from reaching the target by sub-movements [68,69].

Partial error clamp

This study developed a new error clamp method and used it in Experiment-3. Previous motor learning studies have extensively utilized error clamp methods to assess motor adaptation performance [70]. When the error-clamp was active, the trajectory of the hand was attracted to a straight line joining the start circle to the target by a virtual “channel” (see Fig 4A) in which any motion perpendicular to the straight line was pulled back by a one-dimensional spring (800 N/m) and damper (45 Ns/m). However, in contrast to the previous experiments, the error clamp was applied only over the last part of the hand movement (y >75 mm) such that the first part of the movement where the LD is measured (the details will be shown in a later section) is unaffected by the clamp. Furthermore, the magnitude of the spring was set weaker than that in the previous studies, which allows the hand trajectory to change smoothly (see the hand trajectories for LIPF-PEC condition in Fig 4B). We call this a partial error clamp (PEC).

Experiment procedure

Experiment-1.

Thirty participants who passed initial screening (the details will be shown later in Participant screening section) were randomly assigned to each of the two groups (n = 15 for each): the VDCF group and the LIPF group (Fig 1C). First, the participants in both groups were given a practice period to acclimatize themselves to the apparatus and task. They were allowed to take their time but asked to make reaching movements in the no-force field environment (null field) at least more than 50 trials. All participants finished practice less than 100 trials. This was followed by the two experimental sessions: baseline and adaptation sessions. In the baseline session, the participants performed 50 trials of reaching movements in the null field. In the adaptation session, after 5 trials in the null field, the participants in the VDCF and LIPF groups performed 155 (adaptation) trials in VDCF and LIPF, respectively, which was followed by 150 (de-adaptation) trials in the null field. Two-minutes rests were taken three times, each after the 50th, 100th, and 150th adaptation trials.

Experiment-2.

Thirty participants who passed initial screening were randomly assigned to each of the two groups (n = 15 for each): the PSPF group and the CPVF group (Fig 1C). The experimental procedure is the same as Experiment-1.

Experiment-3.

Thirty participants took part in Experiment-3. Half of them who were assigned to the LIPF group of Experiment-1 returned to our laboratory at least more than one week after Experiment-1 and performed Experiment-3. In Experiment-3, unlike Experiment-1, they performed 155 adaptation trials in the LIPF followed by 150 de-adaptation trials in the PEC. Thus, this experimental condition was referred to as the LIPF-PEC condition, while the condition in the Experiment-1 performed by the participants was called as LIPF-Null condition. To compare these two conditions, we needed to cancel out the order effects of the two experimental conditions. We thus newly recruited another fifteen participants. Those who passed initial screening experienced the LIPF-PEC first and then the LIPF-Null conditions. These experiments in the two conditions were again separated by at least one week. The experimental procedure in Experiment-3 is also the same as Experiment-1 except that in the LIPF-PEC condition, the participants performed the 155 de-adaptation trials in the PEC.

Data analysis

Target error (TE) and lateral deviation (LD) were used to evaluate motor adaptation. The TE was defined as x-deviation of the final hand position from the straight line joining the start circle to the target (Fig 1B). The final hand position was defined as the position at the moment when the hand velocity fell below 20 mm/s. The LD was defined as the x-deviations midway (at 75 mm from the start circle) from the straight line joining the start circle to the target.

To draw the participant-averaged trajectories for each of the VDCF (Experiment-1), LIPF (Experiment-1), PSPF (Experiment-2), and CPVF (Experiment-2) conditions, we sampled the x-axis data at the fifteen y positions: 7.5 (target radius size), 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, and 150 (target position) mm for each participant. These sampled data were averaged across participants for each y position and plotted in Fig 3A.

All statistical tests conducted in this study were two-tailed with a significance level of 0.05. To examine changes in each of TE and LD during motor adaptation, we separately performed one-way ANOVAs across trial epochs (6 epochs:1st, 3rd-5th, 136th-155th adaptation trials and 1st, 3rd-5th, 131st-150th de-adaptation trials). When assumptions of heterogeneity of covariance were violated, the number of degrees of freedom was corrected with the Greenhouse-Geisser procedure. Post-hoc pairwise comparisons were performed using Tukey’s method. For other tests, we performed paired or unpaired t-test was performed. The ANOVAs were performed using SPSS Statistics ver. 25 (IBM) and the t-tests were performed using MATLAB version R2018b (Mathworks).

Data exclusion

Trials were excluded from the analysis when the reach distance was less than 75 mm as the LD could not be evaluated (Fig 1B). 34 trials (0.098% of the total number of trials) were excluded. Only one participant in the CPVF group was excluded from the analysis because the participant showed unstable trajectory changes over the last 100 de-adaptation trials with at least three large jumps (> 20 mm) across the x-axis as well as an outlying value of the LD over the last 20-de-adaptation trials (outside of 3 s.d. from the mean). 14 participants were thus analyzed for the CPVF group (Experiment-2). Note that for the t-test on the first de-adaptation trial of the CPVF group, the statistical degree of freedom was 12 since the first de-adaptation trial of a participant was excluded due to the trial exclusion criterion.

Participant screening

We screened participants in all the experiments based on trajectory deviation in the baseline sessions. With pilot experiments, we anticipated the persisting curved null trajectory would appear after adaptation to TE-inducing force fields as seen in Fig 3. To assess this phenomenon, we wanted to examine how much the curved trajectories differ from null trajectories in the baseline. However, our pilot experiments observed that some participants showed considerably curved null trajectories (LD of ∼10 mm) in the baseline session because we did not provide participants with any instruction on reaching trajectory for the sake of the research question. Thus, to ensure that baseline null trajectories are the same across all the participant groups, only the participants whose LD averaged over the last 20 trials in the baseline session is less than 4.5 mm proceeded to the learning session. In fact, there were no significant differences in the LD in the baseline session across all the groups: the VDCF, LIPF, PSPF, LIPF groups and the participant group who performed the PEC-LIPF condition first (one-way ANOVA: F(4, 73) = 1.430, p = 0.223, ηp2 = 0.077). The other screened out participants afterwards performed similar reaching experiments which is not related to this study, and thus their data were not further analyzed for this study.

Simulation

To explain adaptive behaviors in the VDCF and the LIPF of the Experiment-1, we utilized two motor learning models: one is proposed by Izawa et al. [23], which we refer to as the flat OFC model, and the other is proposed by Franklin et al. [25], which we refer to as the flat VS model. These original models implement only the internal model learning and can explain monotonic trajectory adaptation as observed in the VDCF. However, they cannot explain non-monotonic trajectory adaptation, nor a persistent change in the null trajectory in the LIPF. We thus extended the two models by introducing a TE-driven kinematic plan adaptation that hierarchically interacts with the internal model adaptation (Fig 6A). We referred to the extended models as the hierarchical OFC model and the hierarchical VS model, respectively.

OFC model.

The original model (i.e., flat OFC model) utilizes optimal feedback control (OFC) theory [24,71] to simulate reaching trajectories during adaptation to a state-dependent novel force field, based on a concept that motor learning is a process to acquire a model of the novel environment and use the model to re-optimize movements. Accordingly, in this framework, motor adaptation is characterized by the knowledge of the environment (the novel force field) which the motor system gradually acquires. The external force imposing to the arm is written by the form: (1) where Ft and Xt are the external force vector and the current state vector of the plant (arm and environment) at time t, respectively. D is the force matrix (e.g. for VDCF, D = B1[0–1;1 0]). What the motor system needs to perform the optimal movement in the force field is the full knowledge of D, which is assumed to be gradually acquired. The knowledge of D during adaptation is represented by the form: (2) where is the estimated force matrix, and α is the learning parameter, which is assumed to gradually increase from 0 to 1 with adaptation. During adaptation, the motor system predicted the external force using as follows: (3)

is the predicted external force vector at time t and is the estimated state vector of the plant and is obtained through the optimal state estimator (see [71]). Accordingly, the motor system produces the motor command optimized for the environment where the predicted external force could impose on the arm. Only when α = 1, does the system have the full knowledge of D and produce the optimal motor commands for the actual environment. When 0 < α < 1, the system has an incomplete knowledge of D and would produce a sub-optimal movement for the actual environment. Thus, by changing the value of α, Izawa et al. simulated reaching trajectories in several phases of motor adaptation using OFC.

For the hierarchical OFC model, we borrowed the idea of a kinematic bias of movement direction proposed by Mistry et al. [26], which we refer to as directional bias. Mistry et al. extended the cost function of OFC by including a directional bias to explain a directional preference of reaching trajectories observed during motor adaptation to an acceleration-based force field. The directional bias represents the desired direction of movement, which is represented by the form: (4)

Where Qd is the directional bias matrix and d = [dx dy]T is the desired directional vector represented as a unit vector. While the original cost function consists of the error term (first term) and the motor cost term (second term) (Eq 5), the cost function of the hierarchical OFC model (Eq 6) has the additional term related to the directional bias (third term in Eq 6) so that any position or velocity perpendicular to the desired direction was penalized as follows: (5) (6) where xt is the current state vector of the plant (the arm and environment) at time t, ut is the motor command vector, pt and vt are the position and velocity vector, respectively, and kp and kv are the weight of bias for position and velocity, respectively. Qt is the weight matrix of state cost, and R is the weight matrix of motor cost. The exponential decay term is included because the directional bias need not exist for the entire motion. In our simulation, these parameters were set as follows: kp = kv = 0.5, τ = 130 (ms). The cost parameters included in Qt and R were determined to produce trajectories similar to those in the experiments (see S2 Text). The reaching movement was simulated for 0≤tT+TH where T is the maximum movement completion time and TH is the time for which the hand was supposed to hold a position at the target after movement completion (see [23]). T and TH were set to 400 (ms) and 50 (ms), respectively.

Here, we further extended this idea by introducing a directional bias modulated by trial-by-trial TE (upper panel, Fig 6B). The directional bias is inclined in the opposite direction of TE to reduce it. The direction of the directional bias in the i-th trial is represented by φi, the angle from the target direction (clockwise as positive). The TE is equivalent to the directional error represented by θi, defined as the angle between the target direction from the start position and the direction from the start position to the endpoint of the reaching. In the presence of TE (i.e., TE > target size), the directional bias is updated according to the directional error as follows: (7) where the constant b is the forgetting rate and is set to 0.95. The constant r is the sensitivity to the degree of the directional bias update to the directional error and set to 0.85. The initial value of the directional bias is 0 (i.e. φ1 = 0).

In the absence of TE (i.e., TE < target size), we assumed that the direction bias subtly decays across trials to the original direction towards the target as follows: (8)

Additionally, we assume that the kinematic plan adaptation is also affected by the motor cost (Eq 6, and see S2 Text) of the generated reaching, and the decay of the directional bias stops, i.e., b = 1 when the cost goes below less than 0.01. The threshold value was arbitrarily determined to produce curved null trajectories similar to those in the experiments.

Once a TE greater than the target size occurs, the kinematic bias is active. In contrast, if the TEs keep within the target size throughout the experiment, the kinematic bias remains inactive.

Next, to simulate the internal model adaptation in novel force fields, we changed the value of learning rate α. In the adaptation phase, α is increased from 0 to 0.8 such that αi = 0.8·log(log(i)+1)/log(log(155)+1) for 1≤i≤155. In the de-adaptation phase, α is decreased from 0.8 to 0 in the first 30 de-adaptation trials because de-adaptation process is well known to be much faster than adaptation process[72]. This was given by αi = 0.8·log(log(i−155)+1)/log(log(30)+1) for 156≤i≤185; αi = 0 for 186≤i≤305. The update rule for α was determined to well reproduce the experimental observations.

We simulated the reaching trajectory of the arm modeled as a point mass in the Cartesian coordinates. The movement distance was 150 mm. B1 and K1 were set to 7 Ns/m and 120 N/m for the simulation of VDCF and LIPF, respectively to produce trajectories similar to those in the experiments. PEC was applied over the second half of movement (y > 75 mm) as a one-dimensional spring force (1500 N/m) and damper (100 Ns/m) along x-axis. We discretized the system dynamics with a time step of Δt = 10 ms and performed the model simulation in a similar way as that introduced by Izawa et al. [23], except for the directional bias modulated by history of TE. Please see S2 Text for further detail of the model (section of OFC model).

V-shaped model.

The original model (i.e., flat VS model) assumes that desired trajectory, which the motor system should trace, is a fixed straight line joining the start and target and that the motor command is gradually corrected to reduce the difference between the actual and desired trajectory, which is defined as movement error. In simulation with the model, the error is represented in coordinates of muscle length and written by the form: (9) where E is the movement error which is the difference between the actual muscle length, λ and the desired muscle length, λ0. This error is used to update feedforward command to the individual muscle of the arm on a trial-by-trial basis, based on a simple V-shaped learning function (see S2 Text). The feedforward command for each muscle k is updated from to according to the following learning law: (10) where is the stretching/shortening in muscle k at time t for trial i, and Δu is phase advanced by ϕ>0, which is feedback delay. α and β are the learning parameters (α>β>0) and γ (>0) is a constant de-activation parameter. The term gd (>0) indicates the relative level of velocity error to length error. By implementing this learning law to a 2-joint 6-muscle arm model, Franklin et al. [25] and Tee et al. [73] simulated the reaching trajectories in a broad range of novel force field environments.

Here, we extend the flat model by introducing an idea that the desired trajectory (lower panel in Fig 6B), which is represented in the Cartesian coordinates, is updated according to a trial-by-trial TE in a similar way to the hierarchical OFC model. The desired trajectory is described as a curved line with a deflection, dx, 120 mm away from the start position along the y-axis (Fig 6B). Before adaptation, the desired trajectory is the straight line towards the target, that is, dx = 0. In the presence of TE (i.e., TE > target size), dx is updated as follows: (11) where the constant b represents the retention of motor learning and is set to 0.95. The constant r to the degree of update of dx to the TE in the previous trial and is set to 0.45. The constant r is the sensitivity to the degree of the desired trajectory update to TE. In the presence of TE, dx is modulated such that the desired trajectory is deflected in the opposite direction to a trial-by-trial TE. The desired trajectory with dx was calculated as the minimum jerk trajectory with the via-point at [dx 120] (mm) from the start position [74].

In the absence of TE (i.e., TE < target size), we assumed that the desired trajectory subtly decays across trials to the original direction towards the target as follows: (12)

We again assume that the kinematic plan adaptation is affected by the motor cost of the generated reaching, which is calculated as average muscle tension across all the 6 muscles during movement (see S2 Text). When the cost goes below less than 350, the decay of the desired trajectory stops, i.e., b = 1. The threshold value was again arbitrarily determined to produce curved null trajectories similar to those in the experiments.

In simulation, the desired trajectory was converted from the Cartesian to muscle space to apply it to the learning law (Eq 10). The start and target positions were at [0, 350] and [0, 500] (mm) in the Cartesian coordinate (where [0, 0] is at the shoulder joint), respectively. The reach duration was 400 ms. For simplicity, all noise parameters were set to zero. B1 and K1 (see the section of force fields) were set to 20 Ns/m and 120 N/m, respectively, to produce trajectories similar to those in the experiments. PEC was applied over the second half of movement (y > 75 mm) as a one-dimensional spring force (2500 N/m) and damper (1000 Ns/m) along x-axis. We performed the model simulation in the same way as that introduced by Franklin et al. [25], except that the desired trajectory is modulated by history of endpoint error. Please see S2 Text for further detail of the model (section of V-shaped model).

Supporting information

S1 Text. Trajectory adaptation in Experiment-2.

https://doi.org/10.1371/journal.pcbi.1008481.s001

(DOCX)

S1 Fig.

Trajectory adaptation in Experiment-2: (A, C) The hand trajectories of two representative participants and learning curves in PSPF (A) and CPVF (C) averaged across all participants. Note that the scales differ between x and y axes to clearly show trajectory changes along the x-axis. The light gray shades behind some trajectories represent a schematic image of the force field. The adaptation of the TE and LD are shown by traces with open circles and filled circles, respectively. The first 15 TE and LD values are plotted for every single trial, while the subsequent trials (indicated by thick gray lines at the bottom of the figure) are plotted for every five trials. The shaded gray areas around the lines represent standard errors. The light green zones represent the target width (radius: 7.5 mm). (B, D) The TEs and baseline-subtracted LDs in six trial epochs (1st, 3rd-5th, 136th-155th adaptation trials, and 1st, 3rd-5th, 131st-150th de-adaptation trials) in PSPF (B) and CPVF (D). Gray dots represent data from individual participants. The error bars indicate standard errors. The light green zone in the TE plots represents the target width.

https://doi.org/10.1371/journal.pcbi.1008481.s003

(TIF)

S2 Fig.

Simulation results for trajectory adaptation in CPVF, represented by TE (open circle) and LD (filled circle) by the flat (A, B)/hierarchical (C, D) OFC (upper panels) and VS models (lower panels). The flat learning models (only internal model adaptation) were unable to reproduce either the non-monotonic change in LD or the curved null trajectory with a persistent deviation after exposure to CPVF. However, the hierarchical OFC models (kinematic plan learning and internal model adaptation) successfully reproduced both.

https://doi.org/10.1371/journal.pcbi.1008481.s004

(TIF)

Acknowledgments

We thank Ms. Yuka Furukawa and Ms. Naoko Katagiri for help in recruiting the participants. We thank Dr. Jun Izawa and Dr. Tee for providing the codes used in their studies.

References

  1. 1. Botvinick MM. Hierarchical reinforcement learning and decision making. Curr Opin Neurobiol. 2012;22(6):956–62. Epub 2012/06/15. S0959-4388(12)00087-6 [pii] 10.1016/j.conb.2012.05.008. pmid:22695048.
  2. 2. Sugrue LP, Corrado GS, Newsome WT. Choosing the greater of two goods: neural currencies for valuation and decision making. Nat Rev Neurosci. 2005;6(5):363–75. Epub 2005/04/16. nrn1666 [pii] 10.1038/nrn1666. pmid:15832198.
  3. 3. Shadmehr R, Smith MA, Krakauer JW. Error correction, sensory prediction, and adaptation in motor control. Annu Rev Neurosci. 2010;33:89–108. Epub 2010/04/07. pmid:20367317.
  4. 4. Tseng YW, Diedrichsen J, Krakauer JW, Shadmehr R, Bastian AJ. Sensory prediction errors drive cerebellum-dependent adaptation of reaching. J Neurophysiol. 2007;98(1):54–62. Epub 2007/05/18. 00266.2007 [pii] pmid:17507504.
  5. 5. Wolpert DM, Diedrichsen J, Flanagan JR. Principles of sensorimotor learning. Nature Reviews Neuroscience. 2011;12(12):739–51. pmid:22033537
  6. 6. Kawato M, Furukawa K, Suzuki R. A hierarchical neural-network model for control and learning pmid:3676355 voluntary movement. Biol Cybern. 1987;57(3):169–85. Epub 1987/01/01.
  7. 7. McDougle SD, Ivry RB, Taylor JA. Taking Aim at the Cognitive Side of Learning in Sensorimotor Adaptation Tasks. Trends Cogn Sci. 2016;20(7):535–44. Epub 2016/06/05. pmid:27261056; PubMed Central PMCID: PMC4912867.
  8. 8. Krakauer JW, Hadjiosif AM, Xu J, Wong AL, Haith AM. Motor Learning. Compr Physiol. 2019;9(2):613–63. Epub 2019/03/16. pmid:30873583.
  9. 9. Taylor JA, Krakauer JW, Ivry RB. Explicit and Implicit Contributions to Learning in a Sensorimotor Adaptation Task. The Journal of Neuroscience. 2014;34(8):3023–32. pmid:24553942
  10. 10. McDougle SD, Bond KM, Taylor JA. Explicit and Implicit Processes Constitute the Fast and Slow Processes of Sensorimotor Learning. J Neurosci. 2015;35(26):9568–79. Epub 2015/07/03. pmid:26134640; PubMed Central PMCID: PMC4571499.
  11. 11. Smith MA, Ghazizadeh A, Shadmehr R. Interacting adaptive processes with different timescales underlie short-term motor learning. PLoS Biol. 2006;4(6):e179. Epub 2006/05/17. 05-PLBI-RA-0791R2 [pii] 10.1371/journal.pbio.0040179. pmid:16700627; PubMed Central PMCID: PMC1463025.
  12. 12. Lee JY, Schweighofer N. Dual adaptation supports a parallel architecture of motor memory. J Neurosci. 2009;29(33):10396–404. Epub 2009/08/21. 29/33/10396 [pii] 10.1523/JNEUROSCI.1294-09.2009. pmid:19692614; PubMed Central PMCID: PMC2789989.
  13. 13. Keisler A, Shadmehr R. A shared resource between declarative memory and motor memory. J Neurosci. 2010;30(44):14817–23. Epub 2010/11/05. 30/44/14817 [pii] 10.1523/JNEUROSCI.4160-10.2010. pmid:21048140.
  14. 14. Schween R, McDougle SD, Hegele M, Taylor JA. Explicit strategies in force field adaptation. bioRxiv. 2019:694430.
  15. 15. Kim HE, Parvin DE, Ivry RB. The influence of task outcome on implicit motor learning. Elife. 2019;8. Epub 2019/04/30. pmid:31033439; PubMed Central PMCID: PMC6488295.
  16. 16. Leow LA, Marinovic W, de Rugy A, Carroll TJ. Task errors drive memories that improve sensorimotor adaptation. J Neurosci. 2020. Epub 2020/02/08. pmid:32029533.
  17. 17. Osu R, Hirai S, Yoshioka T, Kawato M. Random presentation enables subjects to adapt to two opposing forces on the hand. Nat Neurosci. 2004;7(2):111–2. Epub 2004/01/28. [pii]. pmid:14745452.
  18. 18. Shadmehr R, Mussa-Ivaldi FA. Adaptive representation of dynamics during learning of a motor task. J Neurosci. 1994;14(5 Pt 2):3208–24. Epub 1994/05/01. pmid:8182467.
  19. 19. Morasso P. Spatial control of arm movements. Exp Brain Res. 1981;42(2):223–7. Epub 1981/01/01. pmid:7262217.
  20. 20. Hollerbach MJ, Flash T. Dynamic interactions between limb segments during planar arm movement. Biol Cybern. 1982;44(1):67–77. pmid:7093370.
  21. 21. Lackner JR, Dizio P. Rapid adaptation to Coriolis force perturbations of arm trajectory. J Neurophysiol. 1994;72(1):299–313. Epub 1994/07/01. pmid:7965013.
  22. 22. DiZio P, Lackner JR. Congenitally blind individuals rapidly adapt to coriolis force perturbations of their reaching movements. J Neurophysiol. 2000;84(4):2175–80. Epub 2000/10/12. pmid:11024106.
  23. 23. Izawa J, Rane T, Donchin O, Shadmehr R. Motor adaptation as a process of reoptimization. J Neurosci. 2008;28(11):2883–91. Epub 2008/03/14. 28/11/2883 [pii] 10.1523/JNEUROSCI.5359-07.2008. pmid:18337419; PubMed Central PMCID: PMC2752329.
  24. 24. Todorov E, Jordan MI. Optimal feedback control as a theory of motor coordination. Nat Neurosci. 2002;5(11):1226–35. Epub 2002/10/31. [pii]. pmid:12404008.
  25. 25. Franklin DW, Burdet E, Tee KP, Osu R, Chew CM, Milner TE, et al. CNS learns stable, accurate, and efficient movements using a simple algorithm. J Neurosci. 2008;28(44):11165–73. Epub 2008/10/31. 28/44/11165 [pii] 10.1523/JNEUROSCI.3099-08.2008. pmid:18971459.
  26. 26. Mistry M, Theodorou E, Schaal S, Kawato M. Optimal control of reaching includes kinematic constraints. J Neurophysiol. 2013;110(1):1–11. Epub 2013/04/05. jn.00794.2011 [pii] 10.1152/jn.00794.2011. pmid:23554437.
  27. 27. Schmidt RA, Lee TD. Motor control and learning: a behavioral emphasis. 4th ed. Champaign, IL: Human Kinetics; 2005. vi, 537 p. p.
  28. 28. Krakauer JW, Pine ZM, Ghilardi MF, Ghez C. Learning of visuomotor transformations for vectorial planning of reaching trajectories. J Neurosci. 2000;20(23):8916–24. Epub 2000/01/11. 20/23/8916 [pii]. pmid:11102502.
  29. 29. Chib VS, Patton JL, Lynch KM, Mussa-Ivaldi FA. Haptic identification of surfaces as fields of force. J Neurophysiol. 2006;95(2):1068–77. Epub 2005/10/07. 00610.2005 [pii] 10.1152/jn.00610.2005. pmid:16207784.
  30. 30. Ganesh G, Haruno M, Kawato M, Burdet E. Motor memory and local minimization of error and effort, not global optimization, determine motor behavior. J Neurophysiol. 2010;104(1):382–90. Epub 2010/05/21. jn.01058.2009 [pii] 10.1152/jn.01058.2009. pmid:20484533.
  31. 31. Kodl J, Ganesh G, Burdet E. The CNS stochastically selects motor plan utilizing extrinsic and intrinsic representations. PLoS One. 2011;6(9):e24229. Epub 2011/09/14. [pii]. pmid:21912679; PubMed Central PMCID: PMC3166292.
  32. 32. Ganesh G, Burdet E. Motor planning explains human behaviour in tasks with multiple solutions. Robotics and Autonomous Systems. 2013;61(4):362–8.
  33. 33. Diedrichsen J, White O, Newman D, Lally N. Use-dependent and error-based learning of motor behaviors. J Neurosci. 2010;30(15):5159–66. Epub 2010/04/16. 30/15/5159 [pii] 10.1523/JNEUROSCI.5406-09.2010. pmid:20392938.
  34. 34. Huang VS, Haith A, Mazzoni P, Krakauer JW. Rethinking motor learning and savings in adaptation paradigms: model-free memory for successful actions combines with internal models. Neuron. 2011;70(4):787–801. )00338-2 [pii] 10.1016/j.neuron.2011.04.012. pmid:21609832; PubMed Central PMCID: PMC3134523.
  35. 35. Vindras P, Desmurget M, Prablanc C, Viviani P. Pointing errors reflect biases in the perception of the initial hand position. J Neurophysiol. 1998;79(6):3290–4. Epub 1998/06/26. pmid:9636129.
  36. 36. Ostry DJ, Darainy M, Mattar AA, Wong J, Gribble PL. Somatosensory plasticity and motor learning. J Neurosci. 2010;30(15):5384–93. Epub 2010/04/16. pmid:20392960; PubMed Central PMCID: PMC2858322.
  37. 37. Modchalingam S, Vachon CM, t Hart BM, Henriques DYP. The effects of awareness of the perturbation during motor adaptation on hand localization. PLoS One. 2019;14(8):e0220884. Epub 2019/08/10. pmid:31398227; PubMed Central PMCID: PMC6688819.
  38. 38. Shmuelof L, Huang VS, Haith AM, Delnicki RJ, Mazzoni P, Krakauer JW. Overcoming motor "forgetting" through reinforcement of learned actions. J Neurosci. 2012;32(42):14617–21. Epub 2012/10/19. pmid:23077047; PubMed Central PMCID: PMC3525880.
  39. 39. Galea JM, Mallia E, Rothwell J, Diedrichsen J. The dissociable effects of punishment and reward on motor learning. Nat Neurosci. 2015;18(4):597–602. Epub 2015/02/24. pmid:25706473.
  40. 40. Codol O, Holland PJ, Galea JM. The relationship between reinforcement and explicit control during visuomotor adaptation. Scientific reports. 2018;8(1):9121. Epub 2018/06/16. pmid:29904096; PubMed Central PMCID: PMC6002524.
  41. 41. Holland P, Codol O, Oxley E, Taylor M, Hamshere E, Joseph S, et al. Domain-Specific Working Memory, But Not Dopamine-Related Genetic Variability, Shapes Reward-Based Motor Learning. J Neurosci. 2019;39(47):9383–96. Epub 2019/10/13. pmid:31604835; PubMed Central PMCID: PMC6867814.
  42. 42. Miyamoto YR, Wang S, Smith MA. Implicit adaptation compensates for erratic explicit strategy in human motor learning. Nat Neurosci. 2020;23(3):443–55. Epub 2020/03/01. pmid:32112061.
  43. 43. Botvinick MM. Hierarchical models of behavior and prefrontal function. Trends Cogn Sci. 2008;12(5):201–8. Epub 2008/04/19. S1364-6613(08)00088-0 [pii] 10.1016/j.tics.2008.02.009. pmid:18420448; PubMed Central PMCID: PMC2957875.
  44. 44. Botvinick MM, Niv Y, Barto AC. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition. 2009;113(3):262–80. Epub 2008/10/18. S0010-0277(08)00205-9 [pii] pmid:18926527; PubMed Central PMCID: PMC2783353.
  45. 45. Ribas-Fernandes JJ, Solway A, Diuk C, McGuire JT, Barto AG, Niv Y, et al. A neural signature of hierarchical reinforcement learning. Neuron. 2011;71(2):370–9. Epub 2011/07/28. S0896-6273(11)00499-5 [pii] 10.1016/j.neuron.2011.05.042. pmid:21791294; PubMed Central PMCID: PMC3145918.
  46. 46. Badre D, Doll BB, Long NM, Frank MJ. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron. 2012;73(3):595–607. Epub 2012/02/14. S0896-6273(12)00075-X [pii] 10.1016/j.neuron.2011.12.025. pmid:22325209; PubMed Central PMCID: PMC3285405.
  47. 47. Badre D, Frank MJ. Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from FMRI. Cereb Cortex. 2012;22(3):527–36. Epub 2011/06/23. bhr117 [pii] 10.1093/cercor/bhr117. pmid:21693491; PubMed Central PMCID: PMC3278316.
  48. 48. Kawato M, Samejima K. Efficient reinforcement learning: computational theories, neuroscience and robotics. Curr Opin Neurobiol. 2007;17(2):205–12. Epub 2007/03/22. [pii] pmid:17374483.
  49. 49. Merel J, Botvinick M, Wayne G. Hierarchical motor control in mammals and machines. Nat Commun. 2019;10(1):5489. Epub 2019/12/04. pmid:31792198; PubMed Central PMCID: PMC6889345.
  50. 50. Barto AG, Sutton RS. Reinforcement learning: The MIT Press; 1998.
  51. 51. Barto AG, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discrete Event Systems journal. 2003;13:44–77.
  52. 52. Diedrichsen J, Hashambhoy Y, Rane T, Shadmehr R. Neural correlates of reach errors. J Neurosci. 2005;25(43):9919–31. Epub 2005/10/28. 25/43/9919 [pii] 10.1523/JNEUROSCI.1874-05.2005. pmid:16251440; PubMed Central PMCID: PMC1479774.
  53. 53. Shadmehr R, Holcomb HH. Neural correlates of motor memory consolidation. Science. 1997;277(5327):821–5. pmid:9242612.
  54. 54. Diedrichsen J, Criscimagna-Hemminger SE, Shadmehr R. Dissociating timing and coordination as functions of the cerebellum. J Neurosci. 2007;27(23):6291–301. Epub 2007/06/08. 27/23/6291 [pii] 10.1523/JNEUROSCI.0061-07.2007. pmid:17554003; PubMed Central PMCID: PMC2216743.
  55. 55. Imamizu H, Kawato M. Neural correlates of predictive and postdictive switching mechanisms for internal models. J Neurosci. 2008;28(42):10751–65. Epub 2008/10/17. 28/42/10751 [pii] 10.1523/JNEUROSCI.1106-08.2008. pmid:18923050.
  56. 56. Burdet E, Osu R, Franklin DW, Milner TE, Kawato M. The central nervous system stabilizes unstable dynamics by learning optimal impedance. Nature. 2001;414(6862):446–9. Epub 2001/11/24. [pii]. pmid:11719805.
  57. 57. Cluff T, Scott SH. Rapid feedback responses correlate with reach adaptation and properties of novel upper limb loads. J Neurosci. 2013;33(40):15903–14. Epub 2013/10/04. pmid:24089496; PubMed Central PMCID: PMC6618484.
  58. 58. Scheidt RA, Ghez C. Separate adaptive mechanisms for controlling trajectory and final position in reaching. J Neurophysiol. 2007;98(6):3600–13. Epub 2007/10/05. 00121.2007 [pii] 10.1152/jn.00121.2007. pmid:17913996.
  59. 59. Crevecoeur F, Thonnard JL, Lefevre P. A Very Fast Time Scale of Human Motor Adaptation: Within Movement Adjustments of Internal Representations during Reaching. eNeuro. 2020;7(1). Epub 2020/01/18. pmid:31949026; PubMed Central PMCID: PMC7004489.
  60. 60. Braun DA, Aertsen A, Wolpert DM, Mehring C. Learning optimal adaptation strategies in unpredictable motor tasks. J Neurosci. 2009;29(20):6472–8. Epub 2009/05/22. 29/20/6472 [pii] 10.1523/JNEUROSCI.3075-08.2009. pmid:19458218; PubMed Central PMCID: PMC2692080.
  61. 61. Maeda RS, Cluff T, Gribble PL, Pruszynski JA. Feedforward and Feedback Control Share an Internal Model of the Arm’s Dynamics. J Neurosci. 2018;38(49):10505–14. Epub 2018/10/26. pmid:30355628; PubMed Central PMCID: PMC6596259.
  62. 62. Hayashi T, Yokoi A, Hirashima M, Nozaki D. Visuomotor Map Determines How Visually Guided Reaching Movements are Corrected Within and Across Trials. eNeuro. 2016;3(3). Epub 2016/06/09. pmid:27275006; PubMed Central PMCID: PMC4891765.
  63. 63. Wagner MJ, Smith MA. Shared internal models for feedforward and feedback control. J Neurosci. 2008;28(42):10663–73. Epub 2008/10/17. pmid:18923042; PubMed Central PMCID: PMC6671341.
  64. 64. Scott SH. Optimal feedback control and the neural basis of volitional motor control. Nat Rev Neurosci. 2004;5(7):532–46. pmid:15208695.
  65. 65. Cashaback JGA, McGregor HR, Mohatarem A, Gribble PL. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning. PLoS Comput Biol. 2017;13(7):e1005623. Epub 2017/07/29. pmid:28753634; PubMed Central PMCID: PMC5550011.
  66. 66. Oldfield RC. The assessment and analysis of handness: the Edinburgh inventory. 1971.
  67. 67. Ganesh G, Takagi A, Osu R, Yoshioka T, Kawato M, Burdet E. Two is better than one: Physical interactions improve motor performance in humans. Scientific reports. 2014;4. ARTN 3824 10.1038/srep03824. WOS:000330045000001. pmid:24452767
  68. 68. Elliott D, Helsen WF, Chua R. A century later: Woodworth’s (1899) two-component model of goal-directed aiming. Psychol Bull. 2001;127(3):342–57. Epub 2001/06/08. pmid:11393300.
  69. 69. Novak KE, Miller LE, Houk JC. Kinematic properties of rapid hand movements in a knob turning task. Experimental Brain Research. 2000;132(4):419–33. pmid:10912823
  70. 70. Scheidt RA, Reinkensmeyer DJ, Conditt MA, Rymer WZ, Mussa-Ivaldi FA. Persistence of motor adaptation during constrained, multi-joint, arm movements. J Neurophysiol. 2000;84(2):853–62. Epub 2000/08/12. pmid:10938312.
  71. 71. Todorov E. Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Comput. 2005;17(5):1084–108. Epub 2005/04/15. pmid:15829101; PubMed Central PMCID: PMC1550971.
  72. 72. Shadmehr R, Wise SP. The computational neurobiology of reaching and pointing. Cambridge, Massachusetts: The MIT Press; 2005.
  73. 73. Tee KP, Franklin DW, Kawato M, Milner TE, Burdet E. Concurrent adaptation of force and impedance in the redundant muscle system. Biol Cybern. 2010;102(1):31–44. Epub 2009/11/26. pmid:19936778.
  74. 74. Flash T, Hogan N. The coordination of arm movements: an experimentally confirmed mathematical model. J Neurosci. 1985;5(7):1688–703. Epub 1985/07/01. pmid:4020415.
  75. 75. Ikegami T, Ganesh G, Gibo LT, Yoshioka T, Osu R, Kawato M. Data for: Hierarchial motor adaptations negotiate failures during force field learning [Internet]. Dryad. 2021. Available from: https://doi.org/10.5061/dryad.5x69p8d2f