The FACTS model of speech motor control: Fusing state estimation and task-based control

doi:10.1371/journal.pcbi.1007321

Fig 1.

Architecture of the FACTS model.

Boxes in blue represent processes outlined in the Task Dynamics model [32, 41]. The articulatory state estimator (shown in red) is implemented as an Unscented Kalman Filter, which estimates the current articulatory state of the plant by combining the predicted state generated by a forward model with auditory and somatosensory feedback. Additive noise is represented by ε. Time-step delays are represented by z⁻¹. Equation numbers correspond to equations found in the methods.

More »

Expand

Fig 2.

The CASY plant model.

Articulatory variable relevant to the current paper are the Jaw Angle (JA), Condyle Angle (CA), and Condyle Length (CL). See text for a description of these variables. Diagram after [78].

More »

Expand

Fig 3.

Example of task variables and gestural score for the word “mod”.

A gestural score for the word “mod” [mad], which consists of a bilabial closure and velum opening gestures for [m], a wide constriction in the pharynx for [a], and a tongue tip closure gesture for [d]. Tasks are shown on the left, and a schematic of the gestural score on the right.

More »

Expand

Fig 4.

Learned LWPR transformations from CASY articulatory parameters to acoustics.

Predicted formant values are shown as white circles. The width of each circle is proportional to the absolute value of the error between the actual formant values and the formant values predicted by the LWPR model. The density distributions reflect the number of receptive fields that cover each point (represented as colors in A, D and the height of the line in B, C, D, F). (A-C) show F1 values. (D-F) show F2 values.

More »

Expand

Fig 5.

FACTS simulation of the vowel sequence [ǝ a i], varying the type of sensory feedback available to the model.

(A) shows a sample simulation with movements of the CASY model as well as the trajectory of the tongue center. (B-E) each show tongue center trajectories from 100 simulations (gray) and their mean (blue-green gradient) with varying types of sensory feedback available. Variability is lower when sensory feedback is available, and lower when auditory feedback is absent compared to when somatosensory feedback is absent. (F) shows the prediction error in each condition. (G-H) show the produced variability in two articulatory parameters of the CASY plant model related to vowel production and (I) shows variability of the tongue center at the final simulation sample. (J) and (K) show prediction error and articulatory variability relative to sensory noise levels when only one feedback channel is available. Decreasing sensory noise leads to increased accuracy for somatosensation but decreased accuracy for audition. Colors in (F-K) correspond to the colors in the titles of (B-E).

More »

Expand

Fig 6.

FACTS model simulations of mechanical and auditory perturbations.

Times when the perturbations are applied are shown in gray. (A) shows the response of the model to fixing the jaw in place (simulating a downward force applied to the jaw) midway through the closure for second consonant in [aba] (left) and [ada] (right). Unperturbed trajectories are shown in black and perturbed trajectories in red. The upper and lower lips move more to compensate for the jaw perturbation only when the perturbation occurs on [b], mirroring the task-specific response seen in human speakers. (B) shows the response of the FACTS model to a +100 Hz auditory perturbation of F1 while producing [ǝ]. The produced F1 is shown in black and the perceived F1 is shown in blue. Note that the perceived F1 is corrupted by noise. The model responds to the perturbation by lowering F1 despite the lack of an explicit auditory target. The partial compensation to the perturbation produced by the model matches that observed in human speech.

More »

Expand

Fig 7.

FACTS model simulations of the response to a +100 Hz perturbation of F1.

(A) and (B) show the effects of altering the amount of noise in the auditory system in tandem with the observer’s estimate of that noise. An increase in auditory noise (A) leads to a smaller perturbation response, while a decrease in auditory noise (B) leads to a larger response. (C) and (D) show the effects of altering the amount of noise in the somatosensory system in tandem with the observer’s estimate of that noise. The pattern is the opposite of that for auditory noise. Here, an increase in somatosensory noise (A) leads to a larger perturbation response, while a decrease in somatosensory noise (B) leads to a smaller response. The baseline perturbation response is shown as a dashed grey line in each plot.

More »

Expand

Fig 8.

Cartoon showing the unscented transform (UT).

The final estimate of the mean and covariance provide a better fit for the true values than would be achieved with the transformation of only a single point at the pre-transformation mean.

More »

Expand