A Hierarchical Neuronal Model for Generation and Online Recognition of Birdsongs

doi:10.1371/journal.pcbi.1002303

Figure 1.

The schematic diagram of a songbird brain with the motor pathway (red arrows), which is considered in the model of song generation, and the anterior forebrain pathway (AFP, black arrows).

In the motor pathway RA neurons that are driven by the HVC control the motor (nXIIts innervates the syrinx) and respiratory (DM) areas. The anterior forebrain pathway communicates with the motor pathway through the LMAN area that provides direct input to the RA region. Abbreviations: DLM, nucleus dorsolateralis anterior, pars medialis; DM, dorsomedial nucleus; HVC, a letter based name; LMAN, lateral magnocellular nucleus of the anterior nidopallium; nXIIts, tracheosyringeal portion of the nucleus hypoglossus; RA, robust nucleus of the arcopallium. Adapted from [87].

More »

Expand

Figure 2.

The scheme of HVC and RA dynamics.

Five RA ensembles are controlled by eight sequentially activated HVC_(RA) ensembles. The horizontal axis denotes time and the arrows describe the specific HVC ensemble that activates the corresponding RA ensembles. The color scheme matches the dynamics shown in Figure 5. The part of the song obtained from the first three RA-patterns (i.e., ensemble combinations 2–4, 1-3-5 and 1–4) is shown as a sonogram in Figure 3.

More »

Expand

Figure 3.

Motor control signals and resulting power spectra generated by the model.

Left: The motor control signals are obtained by a linear combination of sine waves () with (x-axis) and (y-axis) where in (A), in (B) and in (C). Right: These and dynamics are used in the syrinx equations (4) to obtain sound waves with the corresponding sonograms (time (sec) vs. frequency (kHz)). and control the amplitude and frequency of the sound waves, respectively. When , no phonation is produced (mini-breaths, [44]). The fluctuations in the fundamental frequencies in the sonograms on the right can be traced by moving in counter clock-wise direction on the ellipse-like curves on the left starting from the blue arrows at .

More »

Expand

Figure 4.

Summary of nonlinear differential equations (1), (3), (4) and (5) (see Text S1 for Eq. (5)) that are used in the hierarchical model for birdsong generation.

Notice that the output at each level is used as an input to the lower level. Typical dynamics of HVC, RA and oscillator (Osc.) levels are given in Figure 5A, 5B and 5C, respectively. Finally, the output of the oscillator level is used in the syrinx equations to produce appropriate sound waves (Figure 6). See Table 1 for the parameters.

More »

Expand

Figure 5.

Generated dynamics for the first simulation ‘Ideal communication’.

The causal states are shown on the left and hidden states on the right with arbitrary units both in time and neuronal activation. There are three levels: A) HVC (third) level, Eq. (1), B) RA (second) level, Eq. (3) and C) Oscillator (first) level, Eq. (5) in Text S1. At the third level, there are eight HVC ensembles (each represented with a different color) which are activated for a short amount of time to control the dynamics of the five RA ensembles, see also Figure 2. At the second level (left column), the solid lines represent and dashed lines represent , see Eq. (3). At the first level (right), we only show since is a shifted version of , see Eq. (5) in Text S1. At the first level (left), the blue line is and the red line is (which are mostly overlapping because of phase-locking). These output dynamics control the syrinx to obtain synthetic birdsong (Figure 6).

More »

Expand

Figure 6.

The sound wave and sonogram of a generated song.

We plugged the air sac pressure () and stiffness () parameters obtained from the first level output of the generative model (Figure 5C) into the syrinx equations (4). A) The solution of the syrinx equations, i.e. in Eq. (4), arbitrary units. The mini-breaths where no phonation is produced can be clearly observed. B) The sonogram of the soundwave in A (time (sec) vs. frequency (Hz)) is given with a sampling frequency of 12000 Hz. The first ∼3 sec of this sonogram can also be viewed in separate chunks in Figure 3.

More »

Expand

Table 1.

Variables used in the generative and recognition models.

More »

Expand

Figure 7.

First simulation ‘Ideal communication’: The dynamics of song generation (left two columns) and song recognition (right two columns) with arbitrary units.

The format and the generated dynamics are the same as shown in Figure 5. The recognition scheme receives only the output of the first level (bottom left) and reconstructs states at all levels using the online Bayesian inference scheme. It can be seen that the reconstruction is successful as there are only tiny deviations between the true (left) and the reconstructed (right) dynamics.

More »

Expand

Figure 8.

Second simulation ‘Deviation from expected song’: For simplicity, we only show the causal states of the generation (left column) and recognition (right column), where the format is the same as in Figure 5 with arbitrary units.

The listening bird (recognition) hears a slightly deviating syllable between the time steps and indicated by the black rectangle. During this period, the third ensemble of the HVC level (red color) in the singing bird (generation) activates the first and the fourth ensembles of the RA level (blue and cyan colors) while the listening bird expects the activation of the first ensemble of the second level (blue) only. This unexpected sensory input continues until the listening bird starts hearing and recognizing the expected syllables again after . See Figure 9 for plots of the associated prediction errors.

More »

Expand

Figure 9.

Prediction errors for the ‘Deviation from expected song’ simulation with arbitrary units.

Prediction errors for all casual and hidden states during recognition are plotted using the same format as in Figure 5. Note that prediction errors increase during the unknown syllable (between and ) and are observed at all levels.

More »

Expand

Figure 10.

Third simulation ‘Differently wired brains’: For simplicity, we only show the causal states of the generation (left column) and recognition (right column), where the format is the same as in Figure 5 with arbitrary units.

The connectivity matrices ('s) at the second level are different in the singing bird (generation) and in the listening bird (recognition). Recognition still works as well as in the first simulation (‘Ideal communication’, Figure 7) because third level ensembles can compensate for this variability by providing different input to the second level (different I vectors).

More »

Expand

Figure 11.

Generated dynamics for the fourth simulation ‘Cooling of HVC’: We simulated two cooling experiments, where the format is the same as in Figure 5 with arbitrary units.

Left: The rate constant at the HVC (third) level, , is decreased by half. Right: The rate constant of the RA level (second level), , is decreased by half. The change in slows down the dynamics of the system, while cooling at the RA level does not have any significant effect, compare with the dynamics in Figure 5. The parameters used are , and on the left and , and on the right.

More »

Expand

Figure 12.

Recognition results for the fourth simulation ‘Cooling of HVC’, where the format is the same as in Figure 5 with arbitrary units.

Left: We slowed down the singing bird by decreasing the rate constants by 3%: , and . Middle: The rate constants for the recognition are , and . Right: The listening bird can distinguish this subtle change in song speed as can be seen from the prediction errors of the causal states at all three levels. (The hidden states show similar prediction errors at all levels).

More »

Expand