Mechanisms of sensorimotor adaptation in a hierarchical state feedback control model of speech

doi:10.1371/journal.pcbi.1011244

Fig 1.

Designs A and B were developed based on the original FACTS architecture in Parrell et al. [38].

Design A updates the articulatory state prediction based on the final state estimate () which is determined by the original state prediction () and state correction signals based on the auditory prediction errors (κ_tΔy_t). Design B updates the auditory forward model (i.e., auditory prediction) directly from the auditory feedback. Note that there are a few differences between these models and the original design [38], which are described in the Materials and methods section in detail. η denotes Gaussian noise. For adaptation simulations (which involved a short utterance), we did not allow any online compensation, simulating the fact that auditory feedback-based online compensation begins 100–200 ms after the onset of the perturbation (e.g., [27, 50–52]) and somatosensory feedback-based compensation would be nearly negligible (since no somatosensory perturbation was applied). Thus, the default state feedback pathways that allow online compensation were disabled. Instead, the prediction-based pathways were used.

More »

Expand

Fig 2.

Design C was developed based on a modified hierarchical FACTS architecture.

For Design C, a modified hierarchical FACTS architecture in which auditory feedback is used by the task state estimator, rather than the articulatory state estimator, was used. The task state estimator generates a task state prediction based on the previous articulatory state estimate via an articulatory-to-task transformation LWPR model and an efference copy of the task motor commands. This task state prediction can be corrected using auditory prediction errors. During adaptation simulations, when auditory prediction errors are detected, the final (corrected) estimate is used to update the articulatory-to-task transformation model. η denotes Gaussian noise. As with Design A and B, sensory feedback-based online compensation was disabled.

More »

Expand

Fig 3.

Simulation results for different model designs.

Top row: A 400 cents up-shift in the first formant frequency (F1) was applied from trial 21 to trial 140 (yellow shaded area). To extract F1 for each trial, we averaged F1 of the middle 10 time steps (time step 11 to 20) of the 30 time steps of the simulated acoustic data for each production. Experimental data was retrieved from the control group in Kim & Max [19]. Middle row: F1 values produced across time during five early perturbation trials (trials 22, 24, 26, 28, and 30). Lighter shades indicate later trials. Note that the first 10 time steps for each trial are pre-phonatory preparatory movements from the model’s default start position, so no acoustic data are plotted. Bottom row: The true articulatory state of tongue height (solid green lines) and its state estimate (gray dots), expressed in the Maeda Principal Component unit (M), plotted across time steps for the early perturbation trials (trials 22, 24, 26, 28, and 30). Black dots indicate the estimate in a baseline trial. Lighter shades indicate later trials. In Design A, the estimate diverged from the true state across the time steps, and the amount of divergence also gradually increased across the trials. Only in Design C, the true articulatory state for the tongue height demonstrated noticeable adaptation across trials (green lines). The estimates (gray dots) closely tracked their true state in Design C.

More »

Expand

Fig 4.

Design C implemented with adaptive UKF (AUKF).

A: The task estimator (Design C) implemented with AUKF. During perturbation trials, if the squared auditory prediction errors inversely weighted by noise covariance (ϵ) were larger than a given threshold (γ), larger correction signals were generated from the Kalman filter for the current and the following time step. This in turn allowed faster and larger updates in the articulatory-to-task model. B: Simulations are shown with data from the control group in Kim & Max [19] in the top row, and with the native English speakers in Mitsuya et al. [61] in the bottom row. In all three simulations, the AUKF (blue solid line) produced more realistic simulations compared to the non-adaptive UKF (green dashed line). C: The model also generated F2 changes even when only F1 was perturbed. The simulated adaptation was similar to the healthy control group’s data in [62]. Perturbed trials are indicated by yellow shaded areas in both B and C.

More »

Expand

Fig 5.

Auditory prediction and articulatory state feedback control law.

Left: Simulations with auditory prediction fixed to its baseline value. The output showed a greater adaptation response compared to the default simulation where the auditory prediction was allowed to vary. Right: Using updated Jacobian (task-to-articulatory) matrices in the articulatory state feedback control law resulted in similar adaptation behavior (blue solid line) to that of the default mode which updates only the articulatory-to-task transformation (scarlet dashed line). When “naïve” Jacobian matrices generated from the unadapted articulatory-to-task model were used in the articulatory state feedback control law instead (green shaded area), the adaptation behavior was still present. In contrast, when the adapted Jacobian matrix for the state feedback control law was used with the “naïve” articulatory-to-task transformation model, adaptation disappeared (gray shaded area).

More »

Expand

Fig 6.

Within-utterance data during online compensation response.

Yellow shaded area indicates time steps with the 100 Hz up-shift perturbation in F1. Left: F1 decreased across the time steps in response to the unpredictable perturbation. Middle: The compensatory response was also observed in the articulatory state estimate (gray dots) and its true state (green lines) expressed in the Maeda unit (M). Right: Despite the changes in the articulatory state, the task state estimate remained near the task target.

More »

Expand

Fig 7.

The effects of changes in model parameters on adaptation.

The experimental data shown as a comparison (black line) is the control group in Kim & Max [19]. For a comprehensive overview of the parameters used in the figure, see Materials and methods. A: AUKF gains (β_xx, β_Q, β_xy), multiplied to the prior covariance matrix (P_xx), process noise matrix Q, and cross covariance matrix (P_xy) accordingly, affected the adaptation rate and extent. Specifically, larger gains produced faster and larger adaptation. B: Increases in the auditory prediction error threshold (γ) reduced the extent of adaptation. C: Reducing the auditory noise scale (η_aud) increased adaptation in FACTS with UKF (dotted lines). However, changes in auditory noise scale had a minimal effect on adaptation in FACTS implemented with AUKF (solid lines). D: Increases in task target noise (η_targ) did not affect adaptive behavior, but did increase inter-trial variability.

More »

Expand

Table 1.

Notation for FACTS.

Bold and capital notation describe matrix data structures (e.g., Q) and LWPR models (). Regular (unbold) notation describe scalars (e.g., β_xx) or array data structures with a single dimension (e.g., task position ).

More »

Expand

Fig 8.

Task state variables.

Of the five tongue variables, the first two pertaining to tongue tip are defined as tongue tip dental (TT_Den) and tongue tip alveolar (TT_Alv) constriction degrees, each defined at 42.9° and 58.0° from the left part of the horizontal axis (0°). The tongue body variables are tongue body palatal (TB_Pal), tongue body velar (TB_Vel), and tongue body pharyngeal (TB_Pha) which are defined at 92.4°, 121.1°, and 179.8° respectively. The lip aperture (LA) is defined as the distance between the upper lip and the lower lip. The lip protrusion (LPRO) is the horizontal length of the upper (or lower) lip. Here, the task state of the initial position used for each trial is depicted as an example.

More »

Expand

Table 2.

The tuning of various parameters for simulations shown in the figures.

The first three parameters listed (β_xx, β_Q, β_xy) were configured for AUKF, each functioning as multiplicative gains for prior covariance of task state sigma points (P_xx), process noise (Q), and cross covariance of the task state and the auditory feedback sigma points (P_xy) in the cases that AUKF is activated. Note that the gains were 1 for the simulations presented in Figs 3 and 6 because AUKF was not enabled. The other two parameters changed in some simulations were auditory noise size (η_aud) as well as auditory prediction error threshold (γ).

More »

Expand