Table 1.
Variables used for hierarchies of stable heteroclinic channels (SHCs).
Table 2.
Variables used in Bayesian recognition scheme.
Figure 1.
Schematic of the generative model and recognition system.
This schematic shows the equations which define both the generation of stimuli (left, see Equation 2) and the recognition scheme based on a generative model. There are three levels; the phonemic and syllabic levels employ stable heteroclinic channels, while the acoustic level is implemented by a linear transform. corresponds to sound file extracts and
is the resulting sound wave. This sound wave is input to the recognition system, with a linear (forward) projection using the pseudo-inverse
. The recognition of the phonemic and syllabic level uses bottom-up and top-down message passing between the phonemic and syllabic level, following Equation 6.
Figure 2.
Two-level model to generate phoneme sequences.
Schematic illustration of the phoneme sequence generation process. At the syllabic level, one of three syllables is active and induces a specific lateral connectivity structure at the phonemic level. The transition speed at the phonemic level is four times faster than at the syllabic level. The resulting phoneme and syllable dynamics of the model are shown in Fig. 3a.
Table 3.
Default parameters used for simulations with Equations 2 and 3.
Figure 3.
Recognition of a sequence of sequences.
(A): Dynamics of generated causal and hidden states at the phonemic and syllabic level, using Equation 2. At the syllabic level, there are three different syllables (1: blue, 2: green, 3: red), following the syllable sequence 1→2→3. The slowly changing state Syllable 1 causes the faster-moving phoneme sequence a→e→i→o (blue→green→red→cyan), syllable 2: o→i→e→a (cyan→red→green→blue), and syllable 3: a→i→e→o (blue→red→green→cyan). See Fig 2 for a schematic description of these sequences. At the beginning and end of the time-series (top-left plot), we introduced silence by applying a windowing function to zero time points 0 to 50 and 750 to 800. The red arrow indicates the end of the initial silent period. The phonemic states
cause sound waves, resolved at 22050 Hz (see Fig. 1). These sound waves are the input to the recognition system. (B): The recognition dynamics after inverting the sound wave. At the phonemic level, the states follow the true states closely. At the syllabic level, the recognized causal state dynamics
are rougher than the true states but track the true syllable sequence veridically. The high-amplitude transients of
at the beginning and end of the time-series are due to the silent periods, where the syllabic recognition states
experience high uncertainty (plotted in grey: confidence intervals of 95% around the mean). Note that the hidden states, at both levels, experience high uncertainty whenever a phoneme or syllable is inactive. The red arrow indicates an initial but rapidly corrected mis-recognition of the causing syllable.
Figure 4.
Recognition of sequences with phonotactic violation.
(A): True and recognized syllable dynamics of a two-level model when the syllables are unknown to the recognition system. Left: True dynamics of , Right: Recognition dynamics for
. (B): Left: Prediction error
at phonemic level. Right: Prediction error
at syllabic level. (C): Zoom of dynamics shown in A and B from time points 440 to 470. See text for description of these dynamics.
Figure 5.
Recognition of unexpectedly fast phoneme sequences.
(A): True and recognized syllable dynamics of a two-level model when the phoneme sequence is generated with a rate constant of but recognized with a rate constant of
, i.e. speech was 50% faster than expected. Left: True dynamics of
, Right: Recognition dynamics for
. (B): Prediction error
at syllabic level. (C) Top: Prediction error
at phonemic level. Bottom: Prediction error
at syllabic level.