Learning to synchronize: How biological agents can couple neural task modules for dealing with the stability-plasticity dilemma

doi:10.1371/journal.pcbi.1006604

Fig 1.

Model and task overview.

A: General model overview. The model consists of 3 units. A Processing unit contains a classic neural network that learns the (reversal learning) task. The Control and RL units constitute a hierarchically higher network. Putative brain areas are shown in italic font. The Control unit drives synchronization of oscillations in the Processing unit. The RL unit evaluates current behavior in order to signal to the Control unit what should be synchronized in the Processing unit. B: Reversal learning task. The task alternates between 3 task rules (A, B, C) across 6 task blocks with task sequence ABCABC. Plasticity is measured during the first 5 trial bins of the first task block in which a task rule is presented (green bars). Stability is measured as the difference between the last 5 trial bins of the first task block in which a task rule is presented, and the first 5 trial bins of the second task block in which a task rule is presented.

More »

Expand

Fig 2.

Detailed overview of the model.

A: The model. A detailed version of the model in Fig 1A is shown. The model consists of 3 units. A Processing unit is localized in posterior processing areas and contains a classic neural network. This network contains 3 layers (of nodes) for the BP model and 2 layers for the RW model. Layer 1 contains nodes that are activated by external input. At layer 2, modularity is implemented. This layer is divided in 3 task modules, one for each task the model has to execute. In the BP model, the nodes in these task modules represent hidden nodes; for the RW model these nodes represent response options. Layer 3 only occurs in the BP model and contains three response options. The Control unit consists of two parts. Here, the LFC contains 4 task neurons; 3 neurons point to a specific task module in the Processing unit that should be synchronized or desynchronized. A fourth neuron points to layer 1 and 3, to indicate that task modules should be (de)synchronized with these layers. The pMFC of the Control unit contains one single node that sends bursts in order to (de)synchronize modules in the Processing unit in line with the pointers sent by the LFC. The RL unit contains four neurons. One neuron (V) learns to assign a value to the task modules. Two other neurons (δ—, δ ⁺) compare this value to external reward, in order to compute prediction errors. Negative prediction errors are accumulated in the Switch neuron in order to make a stay/switch decision, which it signals to the LFC. Additionally, the negative prediction error neuron signals to the pMFC (by giving bursts) that it should increase control. B: Neuronal triplet. Every square node in A consists of a triplet of neurons. Each such node consists of a phase-code pair (E, I) which, because of its excitatory (E)—inhibitory (I) coupling, oscillates at a certain frequency. These oscillations modulate the excitability of their rate code neuron (x) in line with the BBS hypothesis.

More »

Expand

Fig 3.

Neuronal triplets.

A: The pMFC. In the pMFC, the phase code neurons oscillate at a 5 Hz frequency. The rate code neuron of the pMFC gives bursts to the Processing unit. Every time the E-neuron reaches a high amplitude, the probability of a burst becomes high. B: E-neurons of the Processing unit. In the Processing unit, the phase code neurons oscillate at a faster gamma-frequency. It is illustrated how a burst leads to (de)synchronization of oscillations that at first were not (de)synchronized. C: Rate code neurons in the Processing unit. Consequences of synchronization between the phase code neurons can be observed in the rate code neurons. At first, only the neuron of layer 1 is activated because it receives a constant external input signal. Importantly, this activation is modulated by G(Ei) in Eq (4). As a consequence, as long as the E-neurons are not synchronized, communication between the corresponding rate code neurons is very inefficient; but when the E-neurons are synchronized, communication between the corresponding rate code neurons is efficient.

More »

Expand

Fig 4.

Model data.

Model dynamics are shown for simulations with a learning rate of .2. In column 1 (panels A and D) binned accuracy is shown for the full (in blue) and no-synchrony (in orange) model. The horizontal dashed black line indicates accuracy at chance level. In column 2 (panels B and E), brown lines represent synchronization values for the initially (randomly) chosen task module, magenta lines for the module that was chosen secondly, and green lines for the third module. In column 3 (panels C and F), activity of the Switch neuron (see Fig 2A) is shown for one selected simulation of the model (in black). Blue horizontal dashed lines indicate the threshold of the Switch neuron and the yellow arrows mark data points above the threshold. In all panels, red vertical transparent lines indicate task switches and shades indicate 95% confidence intervals.

More »

Expand

Fig 5.

Parameter exploration.

The first row (A-B) shows results for the combination of the Controller frequency and the Processing frequency. The second row (C-D) shows results for the combination of the Damp and r_min parameters. In the first column we show results where the other two parameters where kept constant at the original values that we used for other simulations (i.e., we slice parameter space in these two parameters). In the second row, results are shown where we average over all values used for the remaining parameters. Colors indicate mean accuracy over the whole task. The white dashed lines indicate the original parameter values.

More »

Expand

Fig 6.

Performance of models on reversal learning task.

Overall accuracy (A, D, G), plasticity (B, E, H) and stability (C, F, I) is shown across all learning rates for three tasks of increasing complexity (see Methods for details). Blue lines show means for the full model and orange lines represent the mean values for the no-synchrony models. The shades indicate the corresponding 95% confidence intervals. The horizontal black dashed line in A and D indicates chance level accuracy.

More »

Expand

Fig 7.

Connecting to empirical data.

A, C: Contrast of error–correct trials is shown for post-feedback pMFC power in time-frequency spectrum. B, D: phase-amplitude coupling between pMFC theta-phase and gamma-amplitude in the Processing unit is shown. White vertical dashed lines indicate the moment of reward feedback. Red vertical transparent lines indicate task switches. Shades illustrate 95% confidence intervals.

More »

Expand

Fig 8.

Suggestion of neural origins of three model units.

The Processing unit (in blue) is situated at posterior cortical sites. In the case of a task in which stimuli are visually presented, and responses are hand movements, the Processing unit would consist of visual cortex and pre-motor (and intermediate) areas. The RL unit (in red) could be localized in aMFC (in combination with brainstem and frontopolar cortex (not depicted). The Control unit (in grey) consists of LFC and pMFC.

More »

Expand