Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Between-Trial Forgetting Due to Interference and Time in Motor Adaptation

  • Sungshin Kim ,

    Affiliation Neuroscience Graduate Program, University of Southern California, Los Angeles, 90089, United States of America

  • Youngmin Oh,

    Affiliation Neuroscience Graduate Program, University of Southern California, Los Angeles, 90089, United States of America

  • Nicolas Schweighofer

    Affiliations Biokinesiology and Physical Therapy, University of Southern California, Los Angeles, 90089, United States of America, M2H Laboratory, Euromov, University of Montpellier I, Montpellier, France


Learning a motor task with temporally spaced presentations or with other tasks intermixed between presentations reduces performance during training, but can enhance retention post training. These two effects are known as the spacing and contextual interference effect, respectively. Here, we aimed at testing a unifying hypothesis of the spacing and contextual interference effects in visuomotor adaptation, according to which forgetting between trials due to either spaced presentations or interference by another task will promote between-trial forgetting, which will depress performance during acquisition, but will promote retention. We first performed an experiment with three visuomotor adaptation conditions: a short inter-trial-interval (ITI) condition (SHORT-ITI); a long ITI condition (LONG-ITI); and an alternating condition with two alternated opposite tasks (ALT), with the same single-task ITI as in LONG-ITI. In the SHORT-ITI condition, there was fastest increase in performance during training and largest immediate forgetting in the retention tests. In contrast, in the ALT condition, there was slowest increase in performance during training and little immediate forgetting in the retention tests. Compared to these two conditions, in the LONG-ITI, we found intermediate increase in performance during training and intermediate immediate forgetting. To account for these results, we fitted to the data six possible adaptation models with one or two time scales, and with interference in the fast, or in the slow, or in both time scales. Model comparison confirmed that two time scales and some degree of interferences in either time scale are needed to account for our experimental results. In summary, our results suggest that retention following adaptation is modulated by the degree of between-trial forgetting, which is due to time-based decay in single adaptation task and interferences in multiple adaptation tasks.


It has been known for more than a century that manipulating the schedules of motor training affects performance during training and retention [13]. In particular, practicing a task with temporally spaced presentations or with other tasks intermixed between presentations reduces performance during training compared to blocked presentations, but can lead to superior retention. These phenomena are known as the spacing effect and the contextual interference effect, respectively [47].

According to the “forgetting-reconstruction” theory of the contextual interference effect, short-term forgetting between presentations of the same task results in stronger memories [8]. Such a forgetting view had previously been advanced to explain the spacing effect in verbal learning [9]. Lee and Magill then proposed a unifying mechanism of the spacing and contextual interference effects: both spacing between presentations and interference by another task promote forgetting in working memory, which will depress performance during acquisition, but will promote retention [5].

However, to the best of our knowledge, no study has directly compared spaced and intermixed practice in motor adaptation. Our goal here was therefore two-fold. First, we aimed at directly testing the effects of temporally spaced presentations and task-intermixed presentations in visuomotor adaptation. Specifically, we compared performance during and after training in three visuomotor adaptation conditions: a short inter-trial-interval (ITI) condition with a single task (SHORT-ITI); a long ITI condition with a single task (LONG-ITI); and an alternating condition with two alternated opposite tasks (ALT), with the same single-task ITI as in LONG-ITI. In this design, the only difference between LONG-ITI and ALT schedules was the intercalation of a secondary task in ALT. We could therefore estimate the effect of forgetting due to time by comparing LONG-ITI to SHORT-ITI, and the effect of forgetting due to interference by comparing LONG-ITI to ALT.

Second, using a combined approach of computational modeling and behavioral experiment, we aimed at testing the unifying mechanism of the spacing and contextual interference effects akin to that proposed by Lee and Magill but for visuomotor adaptation. One important advantage of computational models is the ability to estimate latent variables underlying adaptation and to make predictions based on the estimated variables [1013]. In particular, in the two-state model [12], a fast learning process contributes to fast initial learning but forgets quickly, and a slow learning process contributes to long-term retention but learns slowly. It has been proposed that rapid forgetting in the fast process can produce the spacing effect [14]: in conditions with short ITIs, there is little forgetting between consecutive presentations of the task, and performance improves quickly because of large update of the fast process. In contrast, in conditions with long ITIs, there is significant fast process forgetting between trials: performance improves slowly, and the larger errors result in greater updates of the slow process, leading to increased long-term retention.

The two-state model cannot, however, explain data on multiple task adaptation and thus the effect of interfering tasks, because adaptation to a new task overrides previous adaptation in this model. When given appropriate contextual cues, humans can simultaneously adapt to two visuomotor rotations [1517], two saccadic gains [18], and even, in some cases, two opposite force fields [19, 20]. To account for such data, we previously proposed an updated model with a “common fast process”, which is highly prone to interference and competes for errors, and with multiple slow processes protected from interference [21]. In simulations of this model in an alternating schedule with two adaptation tasks of opposite signs, the fast process was interfered by the second alternating adaptation task. This interference induced trial-by-trial forgetting during training and thus reduced overall adaptation rate, thereby increased errors leading to greater update of the slow process, and thus increased retention as in the contextual interference effect [21]

We analyzed patterns of visuomotor adaptation and forgetting from the three experimental conditions, SHORT-ITI, LONG-ITI, and ALT. Guided by the “forgetting-reconstruction” theory, and based on our previous simulations [22], we then proposed to test the “between-trial forgetting, long-term retention hypothesis” (BTF-LTR), which makes the following predictions. The amount of between-trial forgetting during training is predicted to be smallest in the SHORT-ITI, largest in the ALT, and intermediate in the LONG-ITI. Therefore, in the SHORT-ITI, we predicted fastest increase in performance during training and largest short-term forgetting in the retention tests. In contrast, in the ALT, we predicted slowest increase in performance during training and smallest short-term forgetting in the retention tests. Compared to these two conditions, in the LONG-ITI, we predicted intermediate increase in performance during training and short-term forgetting. We then studied what models of adaptation can account for our results by fitting and comparing six possible adaptation models with one or two time scales, and with interference in the fast time scale, the slow time scale, or both time scales.



Forty-six neurologically intact right-handed subjects (10 men and 36 women, 21–32 years old) participated in the study. We randomly assigned the subjects to one of three different experimental conditions: SHORT-ITI, LONG-ITI, and ALT, with a predefined goal of 15 subjects per group. Participants were excluded from the study if the standard deviation of directional error between the target and the final cursor position in the first 80 trials of the familiarization session was greater than 10 degrees (see below). One participant was excluded according to this criterion. The subjects, who were naïve to the purpose of the study, signed an informed consent form prior to participation in this study, which was approved by the Institutional Review Board at the University of Southern California.

Experimental procedures

Design: Subjects sat facing a computer monitor with the right arm supported with a JAECO/Rancho arm support. They controlled a cursor shown on a screen by moving a pen on the surface of a digitizing tablet (sampling rate: 200 Hz, Wacom Tech Corp). An opaque shield blocked vision of their hand and arm. At each trial, a target appeared and subjects were instructed to make a straight and uncorrected out-and-back movement to hit a target. The ITI was defined as the time interval between onsets of two consecutive targets. An initial familiarization session (140 trials with full feedback, 20 trials without feedback) was followed by a block of 30 baseline trials. In both familiarization and baseline, the mapping between the hand position and the cursor was unaltered, and ITI was 5.2 s. In the following training trials, we altered the mapping between the hand position and the cursor position via either a counterclockwise (CCW: +45°) or a clockwise (CW: -45°) visuomotor rotation. The schedules of the training trials varied according to the experimental conditions (Fig 1A). In SHORT-ITI and LONG-ITI conditions, 60 trials of either a CCW or a CW rotation (counterbalanced across subjects) were presented, with ITIs of 5.2 s and 18.4 s, respectively. In ALT conditions, both CCW and CW rotations were presented alternatively with an ITI of 9.2 s, with 60 trials per task (120 trials in total). Thus, the ITI for each task in ALT was 18.4 s, which is the same as the ITI for the LONG-ITI condition.

Fig 1. Experiment design.

A: Training schedules of the three visuomotor adaptation experimental groups. Note that the schedule for each task in ALT is the same as the single task schedule in LONG-ITI. The only difference between ALT and LONG-ITI is the second task (T2) is intercalated between two presentations of the first task (T1). B: The experiment consisted of three blocks: 30 baseline trials, 60 training trials per task (which depend on the condition, as shown in A), and retention test trials, given in blocks of 5 trials at 2 min, 5 min, and 10 min after the end of training trials. C: Distribution of targets: Targets randomly appeared either upward or leftward depending on a task with a color cue, green or blue. To hit a target, subjects had to adapt to the altered the mapping between the hand position and the cursor position via either a counterclockwise (CCW: +45°) visuomotor rotation or a clockwise (CW: -45°) rotation depending on a task.

Then, four blocks of retention tests, each consisting of five trials (ITI = 9.2 s) without feedback, were given at 0 min, 2 min, 5 min, and 10 min after the end of training (Fig 1B). For SHORT-ITI and LONG-ITI, the practiced task (CCW or CW) was presented in the retention tests. For ALT, the first practiced task was presented in the retention tests. The ITI following the last training trial before the retention trials was 5.2 s for all three groups to measure immediate retention.

Single trial: At the beginning of each trial, a white cross appeared at the center of the screen. After a delay of a few seconds, which depends on the specific condition, a target (a colored disk of 0.7 cm radius) appeared at 10 cm from the center either upward or leftward, and signaled the start of the movement. Subjects were instructed to move the 1 cm-cross-shaped white cursor to the target within 1.5 s. Feedback was presented in two forms. First, the cursor was displayed early in the movement, while inside an invisible disk of 3.3 cm-radius centered at the home position (defined as the initial position of the cursor at the onset of target). Second, the cursor re-appeared 1.5 s after the start of movement for 0.5 s at the end-point of movement. To encourage faster reaction after appearance of the target, the cursor color became yellow if the subjects did not move within 1.0 s. If the subject did not move within 1.5 sec, the trial was considered a missed trial. After feedback presentation, a white circle with diameter proportional to the distance between the cursor and the home position helped the subjects to move back to the home position.

The targets appeared at a random position (uniform distribution) either upward (CCW rotation, green in Fig 1C), along an arc ranging from 60° to 120°, or leftward (CW rotation, blue in Fig 1C), along an arc ranging from 150° to 210°. Thus, after complete adaptation, the hand position required to hit the targets (goal positions in Fig 1C) for the two tasks were in two opposite ranges centered on 45° and 235°, respectively. Separation of the workspace (i.e., range of target and goal positions, see Fig 1C) between the tasks allowed all the subjects to succeed in dual adaptation [16, 23]. Variability in the target position at each trial was designed to make it more difficult for subjects to use a cognitive planning strategy that can occur with a fixed target position [24, 25]. In baseline and retention trials, target color was always red and appeared randomly along the leftward or the upward arcs.

Data analysis

At each trial, we measured the directional error between the target and the final cursor position, which was calculated as the angle between the line connecting the center to the target position and the line connecting the center to the final cursor position. Data analysis, as well as model selection and parameter estimation (see below), were conducted based on the median of the bias-corrected movements from 15 subjects for each trial of each experiment group. The advantage of the median over the mean is to reduce the effect of outliers, composed of large overshoots or undershoots. We estimated the directional bias for each subject by taking the mean of the movement directions in the upward or leftward baseline trials, depending on the task, and subtracted this bias from the actual movement directions.

To compute initial learning rates, we fitted an exponential function to the directional errors of initial 10 trials of one task in the training block for each condition. Then, we estimated the initial learning rate as a slope of the tangential line of the fitted function at the first trial (in degrees per trial).

To estimate retention, we calculated the median of 5 trials in each retention test for each subject. Difference in overall performance in the 10 minute retention test was tested with one-way ANOVA. We then compared short-term forgetting, defined as the difference in performance in the immediate retention test and the 2-minute-post-training test for the different training conditions. We included random intercepts to account for inter-subject variability in performance (random intercept, P < 0.0001). We used the smallest BIC for mixed model selection and the Restricted Maximum Likelihood method with SPSS 18. Our criterion for significance was P < 0.05.

The effects of time and/or interference on between-trial forgetting during training were calculated by taking the differences in performance between experimental groups that differ in ITIs (SHORT-ITI vs. LONG-ITI, effect of time), or between experimental groups that differ in the presence of a second task (LONG-ITI vs. ALT, effect of interference), or both (SHORT-ITI vs. ALT, both). Total between-trial forgetting was then assessed by the area under the curve. To perform these subtractions, we used a bootstrap analysis: For each experimental condition, we generated 10,000 bootstrapped data sets. Each set consisted of random selection of the 15 subjects in each condition with replacement.

Computational model

We previously proposed a multiple adaptation model with a parallel structure of fast and slow motor memories [21]. The model contains one common fast-updating fast-decaying process and multiple separate slow-updating slow-decaying processes. Here, we extended the previous model with multiple fast processes. The motor output at each trial n is given by: (1) where xf is a vector for multiple fast learning processes and xs is a vector for multiple slow learning processes. In the case of two tasks, there are two fast and slow processes, xf = (xf1 xf2)T, xs = (xs1 xs2)T. In the original model, the contextual cue only with the slow processes, cs addressed either of processes such that cs = (1 0)T for task 1 and cs = (1 0)T for task 2, where we assumed perfect switching, i.e., no interference or transfer between the slow processes. Additionally, the original model assumed full interference in the fast process. However, in the extended model, we also generalized the degree of interference both in the fast and in the slow processes. The contextual cue vectors of Eq 1 are defined as cf = (1 qf)T or cf = (qf 1)T and cs = (1 qs)T or cs = (qs 1)T, where the free parameter qf and qs modulate the degree of interference, and range from 0 (no interference) to 1 (full interference). Note that in our experimental design with opposite rotations, -45° and +45°, positive values of qf and qs represent interference because the sign of the state of the fast and slow processes for task 2, xf2 and xs2, are opposite to those of task 1, xf1 and xs1.

To estimate the amount of memory decay as a function of time independently of trials, we updated the model by replacing the forgetting rate parameters with exponential decay terms [10, 26]. The update equations from trial n to trial n+1 for the fast and the slow processes are then as follows: (2) (3) (4) where T(n) is the inter-trial interval following trial n (in seconds); βf, τf, βs and τs are four free parameters with the constraints c.t., which are learning rates and time constants for the fast and slow learning processes, respectively. The motor error, e is the difference between the external perturbation, f and the motor output, y. We assumed no difference in task difficulty between the two tasks, CCW and CW in ALT. To test this assumption, we calculated the initial learning rate from the first 15 trials and as well as final performance as the median of the last 15 trials in the two tasks. We found no significant difference between the two tasks in the initial learning rate (P = 0.093) and the final performance (P = 0.308). Additionally, we found no significant difference in performance between the two tasks during training trials (paired two tailed t-test, P = 0.216). We thus used the same learning rates (βf, βs) and time constants (τf, τs) for the two tasks. Note that in one-task conditions, such as SHORT-ITI or LONG-ITI, the model reduces to the 1-fast-1-slow model of [12] with the exponential decay terms.

Model comparison and parameter fitting

We conducted a model comparison analyses with six models, which are summarized in Table 1. We fitted the listed free parameters of each model to data from the three experimental conditions. Note that model 6 is the generalized model described as in Eqs 14 and all the other models are variants of the model with fewer free parameters. Also, note all models can account for adaptation to two perturbations, but that two data sets (SHORT-ITI and LONG-ITIT) contain data only for one task. We therefore only estimated the interference parameters with the ALT dataset. We did not include models without interference parameters (qf = qs = 0), because we found a significant effect of interference between the two tasks during training (see Results). For model parameter estimation, we used the MATLAB fmincon function, which minimizes the root mean squared error (RMSE) between the observed median data y(n), and the model prediction , at trial n. Model comparison was based on the Bayesian Information Criterion (BIC) as follows. (5) where k is the number of parameters in the model, and N is the number of trials used for model fitting. The first term of the BIC is the fitting error (in brackets), which decreases with more free parameters. However, a complex model with too many parameters can lead to over-fitting and little generalization. By penalizing the number of free parameters in the second term, the BIC allows us to find parsimonious model, with a good balance between fitting errors and model complexity.

Table 1. Description of tested models.

Each of candidate models was fitted to all the data from the three groups (N = 300). For the groups with one task, SHORT-ITI and LONG-ITI, the interference parameters were ignored in models.

For each experimental condition, we generated 10,000 bootstrapped data sets: Each set consisted of random selection out of the 15 subjects in each condition with replacement. We then fitted the candidate models to the median of the selected (bootstrapped) data. We then calculated the BIC of the bootstrapped data set for the candidate models (see Table 1) and the conditions (i.e., SHORT-ITI, LONG-ITI, and ALT). We selected the model with the lowest BICs by performing paired bootstrap t-tests [27] (see details in [21]) and reported bootstrap P-value as the proportion of the bootstrapped data preferring (i.e., with lower BIC) the compared model versus the selected model. To select the model(s) with the significantly lowest BIC, we performed a pairwise bootstrap t-test for every pair of models, taking a P-value at threshold P < 0.001 to account for multiple comparisons.


Fig 2 shows actual adaptation (medians with inter-quartile ranges) of three experimental groups. Adaptation for task 2 in ALT was symmetric to task 1 across entire trials and subjects, showing no significant difference between the two tasks, as we assumed the same learning and forgetting rates for the two tasks (see Methods).

Fig 2. Observed subject behaviors in three experimental conditions.

SHORT-ITI (A), LONG-ITI (B), and ALT (C). Star symbols: median subject performance. Shaded area: inter-quartile (25–75%) ranges.

Initial learning rate analysis

As can be observed in Fig 2, subjects adapted at different rates in the different conditions. A Kruskal-Wallis test confirmed that initial learning rates differed across experiment groups (mean ± SEM; SHORT-ITI: 8.69 ± 1.50°/trial, LONG-ITI: 5.34 ± 1.36°/trial, ALT: 3.91 ± 0.74°/trial, P = 0.027, Fig 3A). A post-hoc test showed that the initial learning rate of SHORT-ITI was larger than that of ALT (rank-sum test, P = 0.038, Bonferroni-corrected). Although the learning rate of LONG-ITI was not significantly different from learning rates in the other two conditions (SHORT-ITI/LONG-ITI, P = 0.13 and ALT/LONG-ITI, P = 0.92), the mean value was between those of the other two groups. Despite this initial differences in learning rates, there was no difference in final performance (median over the last five training trial) across groups (mean ± SEM; SHORT-ITI: 37.4 ± 2.18°, LONG-ITI: 35.4 ± 1.70°; ALT: 32.9 ± 2.24°, Kruskal-Wallis test, P = 0.45).

Fig 3. Comparison of initial learning and retention.

A: Initial learning rates (performance increase per trial) obtained from the initial 15 trials in the training block. B: Forgetting in the two minute post-training. Forgetting was measured as a difference between zero and two minute retention tests post-training (see Methods). C: Overall retention in 0, 2, 5, and 10 minute post-training. In Fig 4A and 4B, circles indicate performance of individual subjects.

Retention analysis

Mixed model analysis for retention data between the first two tests (at 0 and 2 minutes) showed that training condition (P = 0.037), time of testing after training (P = 0.001), and the interaction of condition and time of testing (P = 0.046) contributed to forgetting significantly. The short-term forgetting within 2 minutes was significantly different from 0 in SHORT-ITI (7.21 ± 1.44°, P = 0.001) and LONG-ITI (5.24 ± 2.42°, P = 0.013). Importantly, the forgetting was not different from 0 in ALT (0.057 ± 2.11°, P = 0.97) (Fig 3B and 3C). The forgetting was larger in SHORT-ITI compared to ALT (P = 0.017), and with a trend towards being greater in LONG-ITI than ALT (P = 0.080), but was not different between SHORT-ITI and LONG-ITI (P = 0.50) (Fig 3B).

There was an effect of condition on overall performance in the 0, 2, 5, and 10 minute post-tests (one way with condition as factor ANOVA, P < 10−4) (Fig 3C). Overall retention performance in LONG-ITI was 31.62 ± 0.82°, and was greater than in SHORT-ITI at 28.32 ± 0.87° (P = 0.02, Bonferroni-corrected) as predicted. However, unlike our predictions, overall retention performance in LONG-ITI was greater than in ALT (26.18 ± 0.89, P < 0.001), and there was no difference between SHORT-ITI and ALT (P > 0.05). This decrease of overall performance in ALT was not due to time-based forgetting, because it was already present in the immediate retention test: Performance in the immediate retention test at 0 minute was smaller in ALT than in LONG-ITI (Fig 3C, P = 0.01). This lower performance in ALT in the immediate test was due to a (marginally significant) performance drop of performance in ALT from the last 5 trials at the end of training (at which time there was no difference (two tailed t-test, P = 0.40) between ALT and LONG-ITI) to the first immediate test (P = 0.05)

Dissociation of time and interference-induced between-trial forgetting

Between-trial forgetting due to either time or interferences or both during training was assessed by taking the difference in performance between conditions (see Methods). The total amount of between-trial forgetting due to time (Fig 4A) or interference (Fig 4B) or both (Fig 4C) was significantly larger than zero (areas under the curve, two tailed t-test P < 10−14 for forgetting due to time and P = 0.013 for forgetting due to interference, and P < 10−14 for forgetting due to both time and interference. Note how, on all the panels of Fig 4, between-trial forgetting initially increased but then converged to zero toward the end of training, showing that the between-trial forgetting occurs at relatively fast time scale during training.

Fig 4. Estimated effects of time and/or interference on between-trial forgetting with data and predictions from the three selected models.

A: Time effect as the difference in training performance between SHORT-ITI and LONG-ITI. B: Interference effect as the difference in training performance between and LONG-ITI and ALT. C: Combined effect of time and interference as the difference in training performance between SHORT-ITI and ALT. Black: data, blue: model 4 (two interfering fast processes and two independent slow processes), green: model 5 (two independent fast processes and two interfering slow processes), red: model 6 (two interfering fast processes and two interfering slow processes). Dots and lines indicate the means, shaded areas indicate the ± S.D. from 10,000 bootstrapped samples.

Model comparison analysis

Table 2 summarizes the results of the model comparison analysis from 10,000 bootstrapped data for the six models, with BICs. The BICs for three models (models 4, 5, and 6, all with some degree of interferences in the fast, slow, or both processes) were significantly lower (P < 0.05) than BICs of model 1 (Two interfering slow processes), of model 2 (Common fast and two independent slow processes), and of model 3 (Common fast and two interfering slow processes)—BIC of model 4 was marginally lower than those of model 3 (P = 0.081). However, BICs were not significantly different among the best fitting models 4, 5, and 6. Specifically, BIC of the model with the lowest mean BIC (model 5, Two independent fast processes and two interfering slow processes) was not significantly different (P = 0.258) from that of model 4 (Two interfering fast processes and two independent slow processes), and (P = 0.090) that of model 6 (Two interfering fast processes and two interfering slow processes). Because these three models differ in interference parameters, we compared how they predict between-trial forgetting. Between-trial forgetting due to interference predicted by model 4 (Two interfering fast processes and two independent slow processes) was slightly less than that of models 5 (Two independent fast processes and two interfering slow processes) and model 6 (Two interfering fast processes and two interfering slow processes). In addition, between-trial forgetting in model 4 converged to zero towards the end of training because this model assumed no interference between the slow processes. Note however that, as shown by our model comparison analysis (see above), there was no significant difference among the three models. Thus, although we can conclude that interferences are needed to account for our data, it is unclear from these results whether interferences occur in fast, slow, or both processes.

Table 2. Results of the model comparison analysis.

For the tested models described in Table 1, BICs and the confidence intervals of the estimated parameters from 10,000 bootstrapped data sets were presented. BIC of models 4, 5, 6 with interference parameter(s) was significantly lower than that of the other models.

Table 2 also shows parameter estimates with 95% confidence intervals for all models. Note that estimated time constant of the fast process is roughly between 0.5 and 1.5 minute for all models. Time constant for the slow process is roughly above 20 minutes. Finally, the estimated interference parameters in the fast process are lower in models 4 to 6 than in models 2 and 3 with full interference: While the full interference models 2 and 3 assume qf = 1, the estimated interference parameters was between 0–0.556 for model 4 (interference in fast process only) and between 0–0.449 for model 6 (interference in both processes).


We studied and compared the effect of temporally spaced presentations and the effect of intercalating a second task during visuomotor adaptation with three conditions SHORT-ITI, LONG-ITI, and ALT. The only difference between SHORT-ITI and LONG-ITI was the inter-trial interval (ITI), and the only difference between LONG-ITI and ALT was the intercalation of the second task. Thus, this experimental design allowed us to dissociate the effects of forgetting induced by time and by interference.

Discussion of experimental results

Performance changes during training were as predicted by the BTF-LTR hypothesis: Initial adaptation rate was largest in SHORT-ITI, smallest in ALT, and intermediate in LONG-ITI. Similarly, short-term forgetting in the retention tests (i.e., during the first 2 minutes) was as predicted by the BTF-LTR hypothesis, with the notable finding that there was no forgetting in the retention tests in the ALT condition. Overall, short-term forgetting was largest in SHORT-ITI, and smallest in ALT, and intermediate in LONG-ITI. In addition, our design allowed us to estimate the effect of between-trial forgetting due to time by comparing training performance in SHORT-ITI to LONG-ITI (Fig 4A) and due to interference by comparing training performance in LONG-ITI to ALT (Fig 4B). As predicted by the BTF-LTR hypothesis, between-trial forgetting was significantly greater than zero for both time and interference, although the effect of time appears greater than that of interference.

However, overall retention performance in all retention tests was best in LONG-ITI than in the other two conditions while there was no difference between ALT and SHORT-ITI. This result contradicts, at first sight, prediction from the BTF-LTR and results from the literature on the contextual interference effect of greater retention in condition with alternating tasks [4, 28]. In our experiment, it is notably possible that training was too short to observe a strong contextual interference effect. In addition, this may be due to differences in testing conditions between our study and previous studies: In contextual interference effect studies, there are typically four groups of subjects, two per training condition (alternating and non-alternating): one group with testing schedule identical to training schedule, and the other group with testing condition in the other training schedule [28]. Because we had three different conditions, such an exhaustive testing was not practical and would have led to multiple comparisons. We thus chose to give tests with a single schedule: the tests had the same ITI as in ALT, but with a single task. Thus, although ITI in ALT retention test was identical to ITI in training, retention in ALT may have been affected by the change of context in the tests: although both tasks were practiced during training, a single task was tested following ALT training (in order to keep tests identical across conditions). Such “encoding specificity” is known to affect retrieval of episodic memory [29]. A similar phenomenon may be at play here, partially accounting for the marginal performance drop from the end of training to the immediate retention. However, our results showed that once this change has been recognized, the remaining decay was slow. A recent study showed how the context where learning occurred more strongly affects update and decay of the motor memory associated with the context [30]. Although results from this study may appear to contradict our results, there are important differences. In this previous study, the context referred to movement directions, and not the number of tasks during training as in our study. In addition, the drop in performance that we observed following training in ALT was almost instantaneous (ITI between last training trial and first retention trial was only 5.2 second) and not due to a slower decay process as in the previous study.

Interpretation of experimental results in light of the model comparison

Model 1, the model with the worst fit (i.e. highest BICs), is the only model that contains only slow processes. All better fitting models contain fast and slow processes. This points for at least two time scales in motor adaptation, a well-known result [3134], which accounts for the fast initial adaptation followed by slow and gradual adaptation observed in Fig 2.

The two next best fit models contain a single fast process, thus assuming full interference in the fast process, and two independent slow (model 2) or interfering slow (model 3) processes. These models, which laid the ground for the BTF-LTR hypothesis (reduced adaptation due to both spacing and interferences lead to greater retention), were tested because they extend the models that we developed in previous works [21, 23]. The BICs for these models are however higher than models with non-complete interference in the fast process (models 4, 5, and 6). These models support the BTF-LTR and our results because the mean estimated time constants for the fast and slow processes in models 2 to 6 are approximately 1 minute for τf and 20 minutes for τs, respectively, presumably implying that the fast process almost entirely decay in the first 2 minutes. Thus, performance at the time of the 2 minute retention tests is mostly due to the slow process in all conditions. Because of time-based between-trial forgetting in the LONG-ITI, performance during training is lower, allowing for greater errors to update the slow process, and thus better long-term retention than in the SHORT-ITI. Similarly, because of both time-based between-trial forgetting and additional interference based forgetting in the ALT, performance during training is even lower, allowing for even greater errors to update the slow process and very little forgetting in retention tests.

However, our results show that the degree of interferences in these models is not as high as we previously envisioned: As by parameter estimation, the interference parameter in the fast process is estimated to be between 0–0.556 (model 4) and between 0–0.449 (model 6), whereas models 2 and 3 assumed it was equal to 1. Such relatively small effect of interference in the models is seen in Fig 4B. The degree of interference is dependent on the ITI, with greater interference for small it is between conflicting tasks, but also on the task set up. In our design, we separated the workspace to allow all subjects to learn both tasks, as we noticed in piloting that with overlapping workspaces, a number of subjects were unable to learn both tasks (for similar results, refer to [19]). In addition, the three models with best fits, models 4, 5, and 6, had either interference in the fast, slow, or both processes, respectively. Because there were not significant differences among the BICs of these models, the present results do not allow us to conclude in which process most (all) of the interferences occur. Nevertheless, our results in the ALT condition, as well as previous results showing adaptation to two tasks concurrently (e.g. [1517]), show that the slow process are protected from interference to the extent that dual adaptation is possible, thus pointing to a main locus of interferences in the fast process. This is consistent with previous studies showing that the fast process is linked to working memory, which is vulnerable to interference. In particular, the early phase of adaptation has been shown to correlate with tests of visuo-spatial working memory [22, 34], and the fast process can be interfered by a task that engages declarative memory [35]. In contrast, there are data showing that multiple slow processes are protected from interference [21, 36]. In any case, future work is needed to better understand in which experimental conditions, and at which time scale interferences occur during visuomotor adaptation.

Comparison with previous works

Several recent studies explored the effect of task schedules in motor learning or adaptation, and are related to the present study. Two studies reported no significant difference in learning rates during practice when the time between trials was greater than 1 second [37, 38]. Similarly, we found no difference in the initial learning rates between SHORT-ITI and LONG-ITI. Unlike our results, Huang et al. (2007) showed increased learning rate during practice with a longer ITI group compared to a shorter ITI group [39], which cannot be explained by multi-rate learning models with fixed learning rates. However, they calculated learning performance by averaging 32 trials, which might not capture dynamic change of the fast process with a time constant of less than a minute. Kording et al. (2007) proposed that the spacing effect is due to increase in learning rates of the slower processes during spaced adaptation [13]. Although this is one possible explanation, 1-fast-1-slow models do not require modifiable learning rates to explain the spacing effect: longer ITI induces more forgetting in the fast process, which in turn results in larger errors, and thus greater update in the slow process. In our previous study [22], we compared the contextual interference effect in learning to generate three specific force profiles in healthy individuals and individuals with chronic stroke. We showed that individuals with chronic stroke, who had low visuo-spatial working memory, exhibited little long-term forgetting after either blocked or random schedules. This finding was predicted based on simulations of the Lee and Schweighofer model [21] with decrease in fast process update.


In sum, our results strongly support the BTF-LTR hypothesis: the spacing and the contextual interference effects are largely based on forgetting between presentations of the same task during training. However, the specific mechanisms underlying the enhancement of long-term memory differ in the “forgetting-reconstruction” hypothesis [8] and in the BTF-LTR hypothesis. According to the forgetting-reconstruction hypothesis, forgetting in working memory between spaced presentations necessitates retrieval from long-term memory, which increases long-term retention. In the state-space models and the BTF-LTR hypothesis, forgetting between spaced presentations leads to greater errors during training, which increases the update of the slow process, and in turn leads to decreased forgetting. Note that our account of the contextual interference effect does not exclude additional explanations, such as the “elaboration-distinctiveness” or “deficient processing” [7, 40]. Further studies are needed to dissociate the possible roles of additional mechanisms in both the spacing and the contextual interference effects.


We thank James Gordon, Carolee Winstein, Denis Mottet, and the members of the CNRL lab at USC for helpful comments. NS acknowledges grant NSF BCS 1031899.

Author Contributions

Conceived and designed the experiments: SK NS. Performed the experiments: SK. Analyzed the data: SK YO NS. Wrote the paper: SK YO NS.


  1. 1. Pyle WH. Transfer and interference in card-distributing. J Educ Psychol. 1919;10:107–10. WOS:000202765600011.
  2. 2. Ebbinghaus H. Memory: A contribution to experimental psychology: Teachers college, Columbia university; 1913.
  3. 3. Roediger HL. Memory—a Contribution to Experimental-Psychology—Ebbinghaus,H. Contemp Psychol. 1985;30(7):519–23. WOS:A1985AMY6800002.
  4. 4. Shea JB, Morgan RL. Contextual Interference Effects on the Acquisition, Retention, and Transfer of a Motor Skill. J Exp Psychol-Hum L. 1979;5(2):179–87. WOS:A1979GR83300011.
  5. 5. Pauwels L, Swinnen SP, Beets IA. Contextual interference in complex bimanual skill learning leads to better skill persistence. PloS one. 2014;9(6):e100906. pmid:24960171; PubMed Central PMCID: PMC4069194.
  6. 6. Schmidt RA, Lee TD. Motor control and learning: A behavioral emphasis. 4th ed. Champaign IL: Human Kinetics; 2005. 544 p.
  7. 7. Magill RA, Hall KG. A Review of the Contextual Interference Effect in Motor Skill Acquisition. Hum Movement Sci. 1990;9(3–5):241–89. WOS:A1990EC88400004.
  8. 8. Lee TD, Magill RA. The Locus of Contextual Interference in Motor-Skill Acquisition. J Exp Psychol Learn. 1983;9(4):730–46. WOS:A1983RJ64500013.
  9. 9. Cuddy LJ, Jacoby LL. When Forgetting Helps Memory—an Analysis of Repetition Effects. J Verb Learn Verb Behav. 1982;21(4):451–67. WOS:A1982PA68600006.
  10. 10. Ethier V, Zee DS, Shadmehr R. Spontaneous recovery of motor memory during saccade adaptation. J Neurophysiol. 2008;99(5):2577–83. WOS:000255811500044. pmid:18353917
  11. 11. Zarahn E, Weston GD, Liang J, Mazzoni P, Krakauer JW. Explaining Savings for Visuomotor Adaptation: Linear Time-Invariant State-Space Models Are Not Sufficient. J Neurophysiol. 2008;100(5):2537–48. WOS:000260795600009. pmid:18596178
  12. 12. Smith MA, Ghazizadeh A, Shadmehr R. Interacting adaptive processes with different timescales underlie short-term motor learning. Plos Biol. 2006;4(6):1035–43. Artn E179 WOS:000238249600016.
  13. 13. Kording KP, Tenenbaum JB, Shadmehr R. The dynamics of memory as a consequence of optimal adaptation to a changing body. Nat Neurosci. 2007;10(6):779–86. WOS:000246799800021. pmid:17496891
  14. 14. Sing Gary N B, Adewuyi Adenike, Smith Maurice. A mechanism for the spacing effect: Competitive inhibition between adaptive processes explains the increase in motor skill retention associated with prolonged inter-trial spacing. Advances in Computational Motor Control. 2009.
  15. 15. Imamizu H, Sugimoto N, Osu R, Tsutsui K, Sugiyama K, Wada Y, et al. Explicit contextual information selectively contributes to predictive switching of internal models. Exp Brain Res. 2007;181(3):395–408. pmid:17437093
  16. 16. Woolley DG, Tresilian JR, Carson RG, Riek S. Dual adaptation to two opposing visuomotor rotations when each is associated with different regions of workspace. Exp Brain Res. 2007;179(2):155–65. WOS:000246113200002. pmid:17119942
  17. 17. Choi Y, Qi F, Gordon J, Schweighofer N. Performance-based adaptive schedules enhance motor learning. J Mot Behav. 2008;40(4):273–80. pmid:18628104
  18. 18. Shelhamer M, Aboukhalil A, Clendaniel R. Context‐Specific Adaptation of Saccade Gain Is Enhanced with Rest Intervals Between Changes in Context State. Ann N Y Acad Sci. 2005;1039(1):166–75.
  19. 19. Osu R, Hirai S, Yoshioka T, Kawato M. Random presentation enables subjects to adapt to two opposing forces on the hand. Nat Neurosci. 2004;7(2):111–2. pmid:14745452.
  20. 20. Hirashima M, Nozaki D. Distinct motor plans form and retrieve distinct motor memories for physically identical movements. Curr Biol.: CB. 2012;22(5):432–6. pmid:22326201.
  21. 21. Lee JY, Schweighofer N. Dual Adaptation Supports a Parallel Architecture of Motor Memory. J Neurosci. 2009;29(33):10396–404. WOS:000269087300023. pmid:19692614
  22. 22. Schweighofer N, Lee JY, Goh HT, Choi Y, Kim SS, Stewart JC, et al. Mechanisms of the contextual interference effect in individuals poststroke. J Neurophysiol. 2011;106(5):2632–41. WOS:000297690500043. pmid:21832031
  23. 23. Woolley DG, de Rugy A, Carson RG, Riek S. Visual target separation determines the extent of generalisation between opposing visuomotor rotations. Exp Brain Res. 2011;212(2):213–24. WOS:000292049300005. pmid:21562858
  24. 24. Mazzoni P, Krakauer JW. An implicit plan overrides an explicit strategy during visuomotor adaptation. J Neurosci. 2006;26(14):3642–5. WOS:000236552400005. pmid:16597717
  25. 25. Taylor JA, Ivry RB. Flexible Cognitive Strategies during Motor Learning. Plos Comput Biol. 2011;7(3). doi: ARTN e1001096 WOS:000288995500004.
  26. 26. Tanaka H, Krakauer JW, Sejnowski TJ. Generalization and Multirate Models of Motor Adaptation. Neural Comput. 2012;24(4):939–66. WOS:000301223200004. pmid:22295980
  27. 27. DiCiccio TJ, Efron B. Bootstrap confidence intervals. Stat Sci. 1996;11(3):189–212. WOS:A1996WC62900002.
  28. 28. Schmidt RA, Wrisberg CA. Motor learning and performance. 2004.
  29. 29. Tulving E, Thomson DM. Encoding Specificity and Retrieval Processes in Episodic Memory. Psychol Rev. 1973;80(5):352–73. WOS:A1973Q939300003.
  30. 30. Ingram JN, Flanagan JR, Wolpert DM. Context-dependent decay of motor memories during skill acquisition. Curr Biol. 2013;23(12):1107–12. pmid:23727092
  31. 31. Redding GM, Wallace B. Adaptive spatial alignment and strategic perceptual-motor control. J Exp Psychol Hum. 1996;22(2):379–94. WOS:A1996UD66700009.
  32. 32. Karni A, Meyer G, Rey-Hipolito C, Jezzard P, Adams MM, Turner R, et al. The acquisition of skilled motor performance: Fast and slow experience-driven changes in primary motor cortex. P Natl Acad Sci USA. 1998;95(3):861–8. WOS:000071878500015.
  33. 33. Della-Maggiore V, McIntosh AR. Time course of changes in brain activity and functional connectivity associated with long-term adaptation to a rotational transformation. J Neurophysiol. 2005;93(4):2254–62. WOS:000227701600039. pmid:15574799
  34. 34. Anguera JA, Reuter-Lorenz PA, Willingham DT, Seidler RD. Contributions of Spatial Working Memory to Visuomotor Learning. J Cogn Neurosci. 2010;22(9):1917–30. WOS:000279058200004. pmid:19803691
  35. 35. Keisler A, Shadmehr R. A Shared Resource between Declarative Memory and Motor Memory. J Neurosci. 2010;30(44):14817–23. WOS:000283793400025. pmid:21048140
  36. 36. Pekny SE, Criscimagna-Hemminger SE, Shadmehr R. Protection and Expression of Human Motor Memories. J Neurosci. 2011;31(39):13829–39. WOS:000295363800015. pmid:21957245
  37. 37. Francis JT. Influence of the inter-reach-interval on motor learning. Exp Brain Res. 2005;167(1):128–31. WOS:000233024000015. pmid:16132970
  38. 38. Bock O, Thomas M, Grigorova V. The effect of rest breaks on human sensorimotor adaptation. Exp Brain Res. 2005;163(2):258–60. WOS:000229506100013. pmid:15754173
  39. 39. Huang VS, Shadmehr R. Evolution of motor memory during the seconds after observation of motor error. J Neurophysiol. 2007;97(6):3976–85. WOS:000247938200015. pmid:17428900
  40. 40. Callan DE, Schweighofer N. Neural correlates of the spacing effect in explicit verbal semantic encoding support the deficient‐processing theory. Hum Brain Mapp. 2010;31(4):645–59. pmid:19882649