Traveling waves are commonly observed across the brain. While previous studies have suggested the role of traveling waves in learning, the mechanism remains unclear. We adopted a computational approach to investigate the effect of traveling waves on synaptic plasticity. Our results indicate that traveling waves facilitate the learning of poly-synaptic network paths when combined with a reward-dependent local synaptic plasticity rule. We also demonstrate that traveling waves expedite finding the shortest paths and learning nonlinear input/output mapping, such as exclusive or (XOR) function.
There are approximately 1011 neurons with 1014 connections in the human brain. Information transmission among neurons in this large network is considered crucial for our behavior. To achieve this, multiple synaptic connections along a poly-synaptic network path must be adjusted coherently during learning. Because the previously proposed reward-dependent synaptic plasticity rule requires coactivation of presynaptic and postsynaptic neurons, learning can fail if a subset of neurons along a distant network path is inactive at the beginning of learning. We suggest that traveling waves that are initiated at an information source can mitigate this problem. We performed computer simulations of spiking neural networks with reward-dependent local synaptic plasticity rules and traveling waves. Our results show that this combination facilitates the learning and refinement of synaptic network paths. We argue that these features are a general biological strategy for maintaining and optimizing our brain function. Our research provides new insights into how complex neural networks in the brain form during learning and memory consolidation.
Citation: Ito Y, Toyoizumi T (2021) Learning poly-synaptic paths with traveling waves. PLoS Comput Biol 17(2): e1008700. https://doi.org/10.1371/journal.pcbi.1008700
Editor: Brent Doiron, University of Pittsburgh, UNITED STATES
Received: January 31, 2020; Accepted: January 11, 2021; Published: February 9, 2021
Copyright: © 2021 Ito, Toyoizumi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: Brain/MINDS from Japan Agency for Medical Research and Development [AMED] under Grant Number JP21dm020700 (T.T.) https://www.amed.go.jp/en/index.html Japan Society for the Promotion of Science [JSPS] KAKENHI Grant Number JP18H05432 (T.T.). https://www.jsps.go.jp/english/index.html The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Waves of neural activity in the brain play an essential role in recognition and learning . Among them, traveling waves are observed at different spatial scales in many brain regions by different recording methods, such as electroencephalogram (EEG) [2–4], voltage-sensitive dyes (VSDs) [5,6], and local field potentials (LFP) [7,8]. Traveling waves are typically observed under mild anesthesia [7,9], sleep , or idle .
Cortical traveling waves consist of the upstate and downstate of neurons and propagate these phases coherently [12–15]. The upstate is defined by relatively large membrane potential fluctuations with a high firing rate, while the downstate is referred to as a phase of small fluctuations with little spikes . The propagation of this up/down state is estimated to be slower than the axonal signal transmission, and the activity spreads both as subthreshold and suprathreshold responses . Lubenov et al.  suggested that these traveling waves spread along with anatomical structures rather than spatial distance.
The role of traveling waves has been unclear. One hypothesis is that traveling waves mediate lateral propagation of signals within the cortex [7,19]. Rubino et al.  suggested that the waves mediate information transfer to distant neurons during movement preparation and execution. Another hypothesis is that slow oscillations during sleep contribute to memory consolidation [21,22]. Notably, while these works suggest the significance of traveling waves for learning, specific mechanisms of how traveling waves improve learning are yet to be uncovered. We conducted computer simulations of neural network models to study this.
To explore this mechanism, we modeled synaptic plasticity. Synaptic weight between a pair of neurons changes according to presynaptic and postsynaptic neural activity and a reward signal [23–25]. Reward-modulated spike-timing-dependent plasticity (STDP) strengthens synapses that contribute to eliciting a spike in the presence of a reward signal [26,27]. While this learning rule tends to increase the probability of reproducing a spike sequence that leads to a reward, it cannot efficiently associate spiking activity among indirectly connected neurons. Signal transmission between indirectly connected neurons is crucial for task performance  because most neurons in the brain are connected indirectly .
We hypothesized that a critical role of traveling waves is to propagate neural activity between distant and indirectly connected neurons. Consistently, Lubenov et al.  reported that theta waves in the hippocampus assist signal transmission across areas, such as the amygdala, hypothalamus, and medial prefrontal cortex, and this is also suggested in humans [30,31]. Together with the standard reward-independent STDP , traveling waves could gradually create a repertoire of paths spreading from a wave-initiating site. Once such a repertoire is prepared, neurons are coherently activated along the paths so that reward-modulated STDP could select a subset of these paths to perform a task. We simulate computational models of reward-modulated STDP to study if traveling waves enhance learning.
To test our hypothesis, we used relatively small excitatory spiking neural networks (N ~ 100) with a global inhibitory signal and a global dopaminergic signal. Fig 1A explains the scheme of our setting. For the spiking neuron model, we adopted the leaky integrate-and-fire neuron. The dynamics of the membrane potential vi of neuron i are described by (1) where v0 = -70 mV is the resting potential, hi is the synaptic input from surrounding excitatory neurons to neuron i, is the external input to neuron i, hinh is an inhibitory feedback signal that controls the overall firing rate of the network, computed as the running average of spikes from all neurons (see Material and Methods), and τ = 10 ms is the membrane time constant. hi is updated according to dhi/dt = −hi/τh+h0∑j Sij fj (t−td), with synaptic time constant τh = 5 ms, scaling constant h0 = 60 mV, excitatory synaptic weight Sij from neuron j to neuron i, spike-train fj of neuron j as a sum of delta functions peaking at neuron j’s spike timing, and synaptic transmission delay td = 2 ms. The neuron emits a spike when vi reaches a spiking threshold of -54 mV and then is reset to resting potential at -60 mV. In addition, each neuron receives uncorrelated white Gaussian noise ξi. The noise level is controlled by a time-dependent standard deviation σi(t), modulated by traveling waves as described below. A subset of neurons (stimulated neurons) receives external input as and other neurons receive no external input, = 0 mV. The stimulated neurons receive input pulses at 200 Hz as that enforces them to spike during the first 250 ms of each learning trial (see below for each task setup).
(A) The whole network overview. (B) The STDP learning window. (C) The mechanism of synaptic plasticity. Synaptic weight changes as a product of eligible trace c(t) and dopaminergic signal D(t). (D) The upstate propagation from a presynaptic neuron to a postsynaptic neuron.
As a synaptic plasticity rule (Fig 1B and 1C), we used a modified version of the reward-modulated STDP . In this conventional model, synaptic plasticity does not occur in the absence of reward or punishment. However, recent research suggests that the dopaminergic signal has two different timescales: tonic and phasic . Therefore, we prepared the corresponding tonic variable Dt, which represents the baseline dopamine level and the phasic variable Dp, which represents the dopaminergic signal driven by a reward or punishment. Hence, we assume that Dt signaling induces reward-independent STDP, and Dp signaling induces reward-dependent STDP. The amount of reward or punishment exponentially declines after the stimulation offset with a decay time-constant of 200 ms. Both dopaminergic components are assumed to be modulated by the novelty  of the task. Toward the end of the simulations, both Dt and Dp slowly declined to terminate learning and fix the network (see Material and Methods). Note that dopaminergic signals Dt and Dp are global variables common to all synapses. The synaptic weight Sij (0 ≤ Sij ≤ Smax) from neuron j to i is adjusted when cij > 0 or Dp > 0 according to (2) where Smax = 0.24 is the maximum synaptic weight, τs = 1 ms is a time unit, and cij (−Smax/2 ≤ cij ≤ Smax/2) is the so-called STDP eligibility trace  that accumulates the effects of plasticity events with time-constant τc = 1000 ms, namely, (3) where fi is the spike-train of neuron i, and is the running average of fi with a time constant τSTDP. The increment of cij follows a typical asymmetric STDP window  with amplitude γ = 0.0009 and time-constant τSTDP = 30 ms (Fig 1B). The cij instantaneously increases if there is a pre-before-post-event, instantaneously decreases if there is a post-before-pre-event, and otherwise exponentially decays with the time-constant τc. The upper and lower bounds of cij limit the speed of synaptic change. We assumed no changes in the synaptic weight when cij < 0 and Dp < 0.
For the wave, we used a simple custom-made propagation rule. The upstate is defined as a high noise level state (σi(t) ~6 mV), while the downstate is a low noise level state (σi(t) ~3 mV). These noise levels roughly reproduce the experimentally observed firing rate of 5 Hz in the upstate and 0 Hz in the downstate . The initial upstate spread from externally stimulated neurons in each trial. Then, the upstate propagates from these neurons to the peripheral neurons. The noise level is determined by σi(t) = αi∙ψi+3 mV with influx coefficient αi (see Material and Methods) and local field ψi, representing the average activity of a non-modeled neuron mass around the modeled neuron i. To control the noise level, we constrained the range of σi(t) between 3 and 6 mV and the range of ψi between -1 mV and 100 mV. ψi is updated (Fig 1D) by (4) where τw = 200 ms is the time constant of waves, δt = 20 ms is a propagation delay, θ = 0,001 is a threshold for wave propagation, the expressions j→i and ji→, respectively, represent the sets of j indices that have connections incoming to and outgoing from neuron i. [x]+ is the rectified linear function that takes x for positive x and 0 otherwise. gi(t) describes the time-dependent drive for the local field ψi by the external input. For stimulated neuron i, integrates the external input from stimulation-onset time ton, while time t is in the stimulation interval, where mod is the modulo function. Thus, gi(t) discontinuously increases by η every 5 ms but is constant within this interval. We assume that gi(t) = -5 mV after the stimulation interval. For non-stimulated neurons, gi(t) = 0 mV always holds. The gain factor η takes a task-dependent value as described in Material and Methods. Altogether, the local field around stimulated neurons rapidly increases at the beginning of each learning trial and then diffuses as a wave to the local field of connected neurons. By the end of the learning trial of duration 3.0 s, ψi for all neurons decay close to zero. Neurons are placed on a two-dimensional square sheet. A rigid boundary condition is used so that the waves collapse at the edges of the sheet. To highlight the role of traveling waves, we also simulate models without waves. A constant noise level, σi, is used in these models. The value of σi is chosen so that the overall firing rate is the same as that of the corresponding model with waves. We define the conventional model as the model without the tonic dopamine signal and traveling waves.
Below, we conducted three tasks to illustrate our points. In Task 1, we demonstrate that the combination of reward-dependent STDP and traveling waves can selectively reinforce reward-related paths. In Task 2, we show that traveling waves can empower reward-dependent STDP to reinforce initially weak shortcut paths. In Task 3, we show that the reward-dependent STDP and waves can be exploited to learn the XOR function.
Task 1: Selectively reinforcing poly-synaptic paths
First, we demonstrate that the combination of traveling waves and the STDP rule can strengthen a specific path from a stimulated neuron to a target neuron. This task is especially important in large-scale networks such as the brain because most neurons are indirectly connected. A local STDP rule alone does not efficiently solve this task because coherent activation of distant neurons is rare before learning. Wave signals compensate for this deficiency and facilitate the learning of poly-synaptic paths. This effect turns out to be evident, especially in the presence of the tonic dopaminergic signal Dt, which is not included in the conventional reward-modulated STDP rule. The Dt signal induces a reward-independent STDP that works synergistically with traveling waves to prepare a repertoire of paths starting from the stimulated neuron (see below).
Fig 2 shows the setting and results of this task. Fig 2A Left shows the initial network setting of this task. The central neuron S with coordinates (600 μm, 600 μm) is stimulated by external input. This task aims to strengthen the path from S to target neuron T positions at the bottom (600 μm, 100 μm). We also prepared three false-target neurons F at the left (100 μm, 600 μm), right (1100 μm, 600 μm), and top (600 μm, 1100 μm), respectively. Synaptic connections from S are all outward, while the synaptic connections to T and F are all inward. Other neurons are randomly and unidirectionally connected to adjacent neurons within 200√2 μm with a probability of 0.5. If a neuron is isolated by chance, we repeat the procedure until it gets connected. We used this recurrently connected neural network to model a two-dimensional cortical sheet. The task we consider is information routing in a cortical sheet required for some animal tasks, such as learning an appropriate action in response to a stimulus by preparing a path from visual neurons to motor neurons . The central neuron was stimulated during the first 250 ms of each trial. This causes a traveling wave to build up there and spread to the surrounding neurons gradually. A reward or punishment signal is provided (see Material and Methods for details) if the summed spike-count from the target or non-target neurons reaches a threshold level of 5 in each trial. If the target neuron spikes more than the other three false-target neurons during and after the stimulation, the reward signal Dp (> 0) is provided to the whole network. Meanwhile, if any of the false-target neurons spikes more than the target neuron, the punishment signal Dp (< 0) is provided. We repeated this trial of 3.0 s in duration for 80 times.
(A) An example of a successful trial. The initial synaptic weights are represented in color (Left). All neurons are aligned in a grid with 100 μm spacing, and the adjacent neurons within 200√2 μm are randomly connected with a probability of 0.5. Synaptic connections from stimulated neuron S are all outward, while the synaptic connections to target neuron T and false-target neurons F are all inward. The path from S to T is selectively strengthened at the end of the learning (Middle). The difference between the initial synaptic weight and the final synaptic weight (Right). (B) A successful example of this task. The firing rate of the target neuron selectively increases. (C) The averaged synaptic weight difference from the initial condition to the 10th, 25th, 40th trials is plotted. The averaged synaptic weights are calculated, including the direction of synaptic weights (so that the opposite direction has a minus sign). (D) The averaged synaptic weight difference between the initial trial and the last trial (the 80th trial) is plotted. (E) The success rate of each condition (50 simulations averaged). The shaded area indicates the standard error of the mean. A combination of wave and tonic dopamine signal Dt (red line) shows the best task performance, while the conventional model (black line) fails to complete this task.
In a successful case, the paths from the stimulated neuron at the center to the target neuron at the bottom are selectively strengthened (Fig 2A). Fig 2B shows a successful example of the firing rate of the target (red) and the false-target neurons (black). The firing rate of the target neuron was selectively increased. Fig 2C indicates that the correct paths are gradually strengthened. In the last trial, the combination of waves and the Dt signal successfully establishes a path from the stimulated neuron to the target neuron (Fig 2D). The success rate of each condition is indicated in Fig 2E. Our full model shows the best task performance, while the conventional model (without waves and the Dt signal) fails in this task. For this task to be completed, the Dt signal is critical because the input signal from the stimulated neuron does not reach the target neuron in the initial setting (S1 Fig). Hence, a reward or punishment signal is too unreliable to train the network at the beginning. In contrast, reward-independent STDP, induced by the Dt signal, gradually establishes radially symmetric outbound paths spreading from the stimulated neuron (S2 Fig). Traveling waves speed up this process by enhancing radial spreading neural activity, but they are not effective in the absence of the Dt signal because they drive noisy neural activity (S1 Fig). Once radially symmetric candidate paths were formed (S2 Fig), reward-modulated STDP can select paths toward the target neuron based on reward and punishment signals (Fig 2).
Task 2: Finding a shortcut
The combination of the wave signal and STDP rule can also help find the shortest paths from the stimulated neuron to a target. Generally, finding short paths is vital for fast and reliable computation—transmission through detour paths is slow and fragile because successful transmission depends on multiple neurons’ states, which are unreliable in nature. Finding an initially weak shortcut path might be difficult without traveling waves because the neurons along the shortcut path would seldom be activated coherently. Wave propagation can significantly increase this probability and accelerate the learning process.
Fig 3 shows the setting and results of this task. Similar to Task 1, we placed a stimulated neuron and a target neuron. The stimulated neuron S is located upper-left at (100 μm, 500 μm), and the target neuron T is located bottom-left at (100 μm, 100 μm) (Fig 3A). Neurons within 100√5 μm are randomly and unidirectionally connected with a probability of 0.5. If a neuron is isolated by chance, we repeat the procedure until it gets connected. The synaptic weights of a detour path are initially set three times as strong as the other synapses. The stimulated neuron receives external input at the beginning of each trial for 250 ms. Initially, the signal is only transferred through the detour path, which takes more than 160 ms to reach the target neuron. Meanwhile, it takes less than 100 ms when the signal is transferred through the shortcut paths after learning. This setting could reflect inter-regional signal transmission, for example, where the shorter paths represent direct signal transmission, and the detour paths represent the signal transmission via several relay stations.
(A) An example of a successful trial. Each panel represents the initial synaptic weight (Left), the last synaptic weight (Middle) and the difference between them (Right). Initially, a strong detour path from the stimulated neuron S on the upper-left at (100 μm, 500 μm) to the target neuron T on the bottom-left at T (100 μm, 100 μm) is prepared. Neurons within 100√5 μm are randomly connected with a probability of 0.5. Synaptic connections from S are all outward, while synaptic connections to T are all inward. At the end of the trial, the shortcut paths are strengthened while the detour paths are preserved. (B) The averaged synaptic weight difference from the initial trial to the 10th, 25th, and 40th trials is plotted. The averaged synaptic weights are calculated by each neuron, including the direction of synaptic connection. (C) The averaged synaptic weight difference from the initial trial and the last trial (the 60th trial) is plotted. (D) The amount of reward signal is plotted. The wave conditions can successfully escape a local solution state and reach a better solution for this task. The error bar indicates the standard error of the mean. (E) The latency index takes the latency of the first spike in the target neuron after the stimulus onset if it is below 300 ms and takes 300 ms if the latency is above 300 ms. The condition with waves and tonic dopaminergic signal Dt (red) shows the best performance, while the conventional model (black) fails. The error bar indicates the standard error of the mean.
In a successful case, shorter paths are strengthened while the detour paths are moderately strengthened (Fig 3A). This network change occurs with a continuous reinforcement of shortcut paths (Fig 3B). The wave condition successfully establishes shortcut paths, while the no-wave condition cannot strengthen them (Fig 3C). The Dt signal enhances the role of waves by further strengthening the shortcut paths by reward-independent STDP but is not effective on its own because synapses along the shortcut paths are initially too weak to induce spiking activity in the absence of waves. Fig 3D shows the overall performance of this task. Note that the amount of reward declines with the latency of activating the target neuron (see Material and Methods). The wave condition with the Dt signal outperforms the conventional model. Fig 3E represents the averaged latency index for obtaining a reward after trial onset. The latency index is equal to the latency of the first spike in the target neuron after the stimulus onset but saturates for latency above 300 ms to be insensitive to outliers. The latency index decreases faster in wave conditions than in no-wave conditions. While the effect of Dt on task performance is evident in these networks of recurrently connected neurons, the effect is less prominent in feedforward networks (S1 Text). This result shows that Dt-induced reward-independent STDP is especially important in selectively strengthening outbound paths from the stimulated neuron.
Task 3: Learning a nonlinear function
In this task, we demonstrate that our model is useful for a more practical setting. Here, we show that the XOR function can be learned in our model as well. Nonlinear functions such as the XOR function are essential for complex calculation, but how to realize them efficiently with the reward-modulated STDP rule remains to be seen. We propose that our model has an advantage in this task because some nonlinear functions can be created by finding appropriate poly-synaptic paths. Among the various kinds of nonlinear functions, we chose the XOR function because of its simplicity and universality of logic gates . It is widely known that implementing an XOR function requires a hidden layer in a feedforward neural network. Therefore, this task is difficult for the STDP rule because indirect paths should be learned. Our model can alleviate this difficulty and facilitate the learning process.
Fig 4 shows the setting and results of this experiment. In this task, we used four stimulated neurons located at the bottom, namely 0a (15 μm, 0 μm), 1a (45 μm, 0 μm), 0b (75 μm, 0 μm), and 1b (105 μm, 0 μm) (Fig 4A). In the middle line at Y = 100 μm, 120 neurons were aligned. In the initial setting, these middle layer neurons receive a strong projection (Sij = 0.2) from the nearest stimulated neuron and a weak projection (Sij = 0.1) from another randomly selected stimulated neuron. Two target neurons are positioned at the top, namely, F (30 μm, 200 μm) and T (90 μm, 200 μm). Each middle layer neuron has a strong projection (Sij = 0.2) to one of them. During this task, four different stimuli are provided, where one of the pairs of stimulated neurons 0a0b, 0a1b, 1a0b, or 1a1b receives external input. At the beginning of each trial, the corresponding neurons were stimulated for 250 ms. The target neuron for each of the four stimuli was F, T, T, and F, respectively. If the corresponding target neuron fires more than the other neuron, the reward signal Dp (> 0) is provided. Otherwise, the punishment signal Dp (< 0) is provided. The reason for initially having weak inputs from the stimulated neurons to the middle layer neurons is to expedite learning. If these connections are strong enough, the task can be solvable simply by learning the output-layer synapses. We set these synapses weak enough so that the task performance remains near the chance level by learning only the output-layer synapses. We use a feedforward network in this task, which may be implemented, for example, in three information-processing layers (e.g., layer 4 to layer 2–3 to layer 5) in a cortical column [11,39–41].
(A) A successful example of synaptic weight change. The Initial condition (Left), the last condition (Middle), and the difference (Right). Each middle layer neuron receives a strong synaptic weight and a weak synaptic weight from stimulated neurons and sends an output to T or F. The success rate is initially at the chance level. (B) The averaged synaptic weight difference between each middle layer neuron projecting to T and F is plotted at the 5th, 10th, and 15th trials. Each path strength is calculated as a product of the averaged synaptic weight from stimulated neurons to middle layer neurons and middle layer neurons to a target neuron. Percentage changes in averaged synaptic weight are shown in color. (C) The same plots as (B) at the last trial (the 25th trial). Wave condition (top column) successfully learns the correct paths, while no-wave condition (lower column) fails. (D) The success rate of the XOR task. Wave conditions (red & yellow) shows better results than no-wave conditions (green & black).
Fig 4A shows the synaptic weight change in a successful case. The relevant connections are selectively strengthened or weakened. Each synaptic path strength is calculated in Fig 4B. Correct paths are gradually strengthened through the trial. In the last trial (the 25th trial), the wave condition successfully established the correct paths, while the no-wave condition failed (Fig 4C). Fig 4D shows the task performance for each condition. The wave conditions (red and yellow) perform better than the no-wave conditions (green and black) because, similar to Task 2, the weak connections can only be strengthened with the support of traveling waves. However, the contribution of Dt is small in this task because the signal transmission from the stimulated neurons to the target neurons is easily achieved from the beginning in the presence of waves due to the disynaptic feedforward structure.
We have demonstrated that the combination of traveling waves and tonic dopaminergic signals enhances selective reinforcement of poly-synaptic paths. Further, we showed that this combination is also helpful for learning a shortcut and a nonlinear function. The advantage of traveling waves to send signals across distant neurons is effectively utilized in the tasks we explored. Thus, we argue that a possible role of traveling waves in the brain is to aid local learning rules, such as the reward-modulated STDP, to efficiently learn poly-synaptic paths by inducing coherent activity in neurons along with them.
The advantage of the proposed model over the conventional model is twofold. First, the combination of traveling waves and the tonic dopaminergic signal helps to prepare paths starting from stimulated neurons. In our model, a tonic dopaminergic signal permits reward-independent STDP. In its presence, traveling waves efficiently create a repertoire of poly-synaptic paths spreading from the wave-initiation sites. Second, once a repertoire of paths from the stimulated neurons is prepared, a reward-dependent phasic dopaminergic signal can reinforce its subset. These features are consistent with the biological evidence of recent studies. Beeler et al.  showed that tonic and phasic dopamine have different roles; tonic dopamine modulates the degree of learning and its expression, while phasic dopamine is the main source of reinforcement learning. In addition, Schultz  suggests that the continuous emission of tonic dopaminergic signals controls the motivation for exploration, while the discrete phasic dopaminergic signal induces event-related synaptic plasticity. Our model is also testable by examining the relationship between traveling waves and learning in a specific environment, such as by selective blockade or enhancement of either the tonic or phasic component of the dopaminergic signal.
Our model suggests a mechanism of memory consolidation during slow-wave sleep. Some experiments have observed traveling waves across the entire brain during slow-wave sleep [9,10] and showed their importance in memory consolidation [21,22]. Importantly, dopaminergic neurons emit tonic signals during slow-wave sleep . These studies indicate that the combination of traveling waves and tonic dopaminergic signals may consolidate memory. Our results agree with this view, supporting that the coherent activation of neurons caused by traveling waves can prepare poly-synaptic paths for more rapid and reliable signal transmission (cf. Fig 3). Further studies on the role of traveling waves and dopaminergic signals on the efficacy of poly-synaptic paths during slow-wave sleep likely elucidate the mechanism of memory consolidation.
One limitation of our model is the separation of dynamics between neural activity and wave propagation. In our model, wave propagation is modeled by the local field without specific relation to the membrane potential of neurons. While this approach is reasonable in our study that involves only a small number of neurons, the local field must be defined by the average activity of many neurons in reality . Thus, future large-scale simulations could model the relationship between traveling waves and the membrane potential of neurons in an explicit manner. Further, the current model only involves global inhibition, but different classes of inhibitory neurons contribute to up- and down-states in distinct ways . More subtle features of traveling waves might arise from such detailed modeling. Despite these limitations, our simple model revealed a synergy of traveling waves and dopaminergic signals to efficiently learn the directionality of information flow and distant neural network paths in a reinforcement task. This mechanism would be progressively more important for animals with a larger brain because distant and indirect paths are more dominant. Our study underscores the importance of coherent neural activity in the form of waves for coherent learning beyond pairs of neurons.
Material and methods
We conducted all simulations using the Brian2 simulator (https://brian2.readthedocs.io/en/stable/). This is an open Python library that focuses on simulating spiking neurons . The post-analysis of the simulation is performed by custom-made Python code. The source code is provided in S1 File.
The network of excitatory neurons is defined task by task (see Figs 2A, 3A and 4A). As described in the Results section, all excitatory neurons receive an inhibitory feedback signal and a dopaminergic signal for simplicity (Fig 1A). The whole system of our model is indicated in Fig 5.
Excitatory neurons are locally connected via synapses. Inhibitory feedback signal controls the firing rate of excitatory neurons, global dopaminergic signal modulates the synaptic weights, and wavefield created by the activities of other neurons controls the activity level of each excitatory neuron. External input and reward functions are externally provided. The + = operator means that the right-hand-side is added to the left-hand-side when an event happens (with delay for Dp). We use the same parameters (except three parameters summarized in Table 1) to learn three qualitatively different tasks in different network architectures, which underscores the robustness of the learning rule and the role of traveling waves.
Inhibitory feedbackTable 1, depending on each task because of the difference in the number of neurons and the network structure.
Inhibitory feedback strength β roughly correlates with the number of neuron N. Dopamine signal initial amplitude dp is chosen for the best result for each task. Wave amplitude constant η is chosen depending on the network structure. Recurrent networks need relatively larger value than feedforward networks (see Supporting Information for solving Task 2 in feedforward networks). Note that, among several parameters in the model, β, dp, and η are chosen as representative parameters that control the basic ingredients in the model: global inhibitory signal, dopamine signal, and traveling waves, respectively.
The tonic dopaminergic signal Dt and the phasic dopaminergic signal Dp are essential components of our simulations. The Dt signal is expressed as (6) with tonic dopamine constant dt = 0.003, and the novelty function Novelty(t) (explained below). This setting is fixed in every task we conducted here. We set Dt = 0 for the conventional model.
Dp signal (-0.3 ≤ Dp ≤ 0.3) is adjusted depending on the performance of each task. Dp depends on three variables, reward R, amplitude function ΓR for the reward, and novelty variable Novelty (Fig 5). Dp exponentially decays according to (7) except when a target/false-target neuron spikes. When a target/false-target neuron spikes at time t, Dp instantaneously jumps at time t+tp according to (8) where decay constant τp = 200 ms and transmission delay tp = 100 ms. Note that the + = operator indicates that the right-hand side is added to the left-hand-side variable upon a spiking event (with delay tp). We measured the spike counts of target and false-target neurons by vectors ntrue and nfalse, respectively, in each trial (these vectors are reset to zero at the end of each trial). The raw reward R is a function of ntrue and nfalse. For Task 1, we set R = 0 when these neurons are not very active, namely, when the total spike-count of one target and three false-target neurons is less than 5. This adds robustness to the simulation results. Once the total spike-count reached 5, R = 1.0, when the target spike-count was the greatest and R = -0.5 the target spike-count was not the greatest among the four neurons. Therefore, (9) where I[∙] is the indicator function that takes 1 if the argument is true and 0 otherwise. We mean by max(nfalse) and sum(nfalse) the maximum and the sum of the spike counts of the three false-target neurons, respectively.
For Task 3, we again considered one target neuron and one false-target neuron. R = 1 when the target neuron fires at least more than 5 spikes than the false-target neuron; R = -1 when the false-target neuron fires at least more than 5 spikes than the target neuron; and R = 0 otherwise. Namely, (11) We set this margin of 5 spikes to induce a clear difference in the number of spikes between the target and false-target neurons.
Next, we introduce the reward-amplitude function ΓR. The amount of reward begins to take a non-zero value after the stimulus onset time ton, stays fixed until the stimulus offset time toff, and then decays exponentially. Namely, (12) with a dopamine decay constant τd = 200 ms and the initial amplitude dp, which is set depending on the task (see Table 1).
Finally, we assume that dopamine release increases with novelty  and novelty becomes high when the prediction error is high. We simply assume that Novelty (0 ≤ Novelty ≤ 1) decreases by 0.2 at the end of a correct trial and increases by 0.2 at the end of a wrong trial. Here, we introduce task-dependent correct and incorrect criteria. In Tasks 1 and 3, we used R > 0 and R ≤ 0 at the end of each trial to define a correct and incorrect trial, respectively. In Task 2, we used the latency of signal transmission from the stimulated neuron to the target neuron for the criteria. Latency of less than 100 ms is defined as a success.
Local field and influx coefficients
For the wave, we used a simple custom-made propagation rule. The upstate is defined as a high noise level state (σi(t) ~6 mV), while the downstate is a low noise phase (σi(t) ~3 mV). The noise level is determined by σi(t) = αi∙ψi+3 mV with an influx coefficient αi and local field ψi. The local field is updated as explained in the Results by (13) As an initial condition, we choose ψi = 0 for all i, which corresponds to the downstate. We assume that upstate is induced by external stimuli (e.g., ). The influx coefficient αi quantifies the sensitivity of neuron i’s noise level to ψi and is defined by (14) where ton is again the trial onset. The coefficient αi counts the number of neighboring local fields that influenced ψi in each trial up to time t. The tangent hyperbolic function is introduced to implement a saturation effect. For a conventional setting, σi(t) is set as a constant value adjusted to the same firing rate as the wave condition.
S1 Fig. The difference of the signal-driven spikes and traveling-wave-driven spikes of the target neuron in Task 1.
An example of the membrane potential before learning (Top) and the membrane potential of the same neuron after learning (Middle). The red line indicates spike timing. Before learning, the signal from the stimulated neuron does not reach the target neuron and the target neuron does not fire. In contrast, after learning, the external input reaches the target neuron, and the firing rate increases during the stimulus period. Meanwhile, the firing rate of spontaneous spikes driven by traveling wave does not change before and after learning. The noise level (black line) is changed by a traveling wave of upstate (Bottom). During the stimulus period at the onset of a trial, external input (green bar) is provided to the stimulated neuron.
S2 Fig. Task performance without the Dp signal in Task 1 and 2.
(A) The contribution of reward-independent STDP is shown for Task 1 by setting Dp = 0. The average synaptic weights are computed over 40 simulations, and their differences (from the initial trial to the 15th trial) are plotted with the Dt signaling and traveling waves (Left) and with the Dt signaling alone (Right). Outbound synaptic weights near the stimulated neuron are strengthened by the Dt signaling alone but more strongly with waves. In this task, initial synaptic weights are set rather strong. Hence, the stimulated neuron can propagate its activity to neighboring neurons from the beginning, and poly-synaptic paths toward the target and false-target neurons are gradually extended by reward-independent STDP. This happens even without waves but more efficiently with waves that contribute to the outbound spreading of neural activity. (B) The contribution of reward-independent STDP is shown for Task 2 by setting Dp = 0. The differences of averaged weights (from the initial trial to the 40th trial) are plotted with the Dt signaling and traveling waves (Left) and with the Dt signaling alone (Right). The detour path is efficiently strengthened in both cases because it is strong enough to propagate neural activity from the beginning. However, the shortcut path is strengthened only with waves because it is too weak to propagate neural activity at the beginning. Hence, the shortcut path requires waves to propagate neural activity only with waves, which is required to gradually strengthen the path by reward-independent STDP.
- 1. Klimesch W. Memory processes, brain oscillations and EEG synchronization. Int J Psychophysiol. 1996;24(1–2): 61–100. pmid:8978436.
- 2. Burkitt GR, Silberstein RB, Cadusch PJ, Wood AW. The steady-state visually evoked potential and travelling waves. Clin Neurophysiol. 2000;111(2): 246–258. pmid:10680559.
- 3. Nunez PL, Srinivasan R. Electric fields of the brain: The neurophysics of EEG. Oxford University Press; 2006.
- 4. Srinivasan R, Bibi FA, Nunez PL. Steady-state visual evoked potentials: distributed local sources and wave-like dynamics are sensitive to flicker frequency. Brain Topogr. 2006;18(3): 167–187. pmid:16544207.
- 5. Grinvald A, Lieke EE, Frostig RD, Hildesheim R. Cortical point-spread function and long-range lateral interactions revealed by real-time optical imaging of macaque monkey primary visual cortex. J Neurosci. 1994;14(5 Pt 1): 2545–2568. pmid:8182427.
- 6. Slovin H, Arieli A, Hildesheim R, Grinvald A. Long-term voltage-sensitive dye imaging reveals cortical dynamics in behaving monkeys. J Neurophysiol. 2002;88(6): 3421–3438. pmid:12466458.
- 7. Nauhaus I, Busse L, Carandini M, Ringach DL. Stimulus contrast modulates functional connectivity in visual cortex. Nat Neurosci. 2009;12: 70–76. pmid:19029885.
- 8. Nauhaus I, Busse L, Ringach DL, Carandini M. Robustness of traveling waves in ongoing activity of visual cortex. J Neurosci. 2012;32(9): 3088–3094. pmid:22378881.
- 9. Mohajerani MH, McVea DA, Fingas M, Murphy TH. Mirrored bilateral slow-wave cortical activity within local circuits revealed by fast bihemispheric voltage-sensitive dye imaging in anesthetized and awake mice. J Neurosci. 2010;30(10): 3745–3751. pmid:20220008.
- 10. Massimini M, Huber R, Ferrarelli F, Hill S, Tononi G. The sleep slow oscillation as a traveling wave. J Neurosci. 2004;24(31): 6862–6870. pmid:15295020.
- 11. Sakata S, Harris KD. Laminar structure of spontaneous and sensory-evoked population activity in auditory cortex. Neuron. 2009;64(3): 404–418. pmid:19914188.
- 12. Harris KD, Thiele A. Cortical state and attention. Nat Rev Neurosci. 2011;12: 509–523. pmid:21829219.
- 13. Petersen CC, Grinvald A, Sakmann B. Spatiotemporal dynamics of sensory responses in layer 2/3 of rat barrel cortex measured in vivo by voltage-sensitive dye imaging combined with whole-cell voltage recordings and neuron reconstructions. J Neurosci. 2003;23(4): 1298–1309. pmid:12598618.
- 14. Steriade M, McCormick DA, Sejnowski TJ. Thalamocortical oscillations in the sleeping and aroused brain. Science. 1993;262(5134): 679–685. pmid:8235588.
- 15. Krull EM, Sakata S, Toyoizumi T. Theta oscillations alternate with high amplitude neocortical population within synchronized states. Front Neurosci. 2019. 13(316): 1–16. pmid:31037053.
- 16. Lee BR, Mu P, Saal DB, Ulibarri C, Dong Y. Homeostatic recovery of downstate–upstate cycling in nucleus accumbens neurons. Neurosci Lett. 2008;434(3): 282–288. pmid:18329805.
- 17. Sato TK, Nauhaus I, Carandini M. Traveling waves in visual cortex. Neuron. 2012;75(2): 218–229. pmid:22841308.
- 18. Lubenov EV, Siapas AG. Hippocampal theta oscillations are travelling waves. Nature. 2009;459: 534–539. pmid:19489117.
- 19. Bringuier V, Chavane F, Glaeser L, Frégnac Y. Horizontal propagation of visual activity in the synaptic integration field of area 17 neurons. Science. 1999;283(5402): 695–699. pmid:9924031.
- 20. Rubino D, Robbins KA, Hatsopoulos NG. Propagating waves mediate information transfer in the motor cortex. Nat Neurosci. 2006;9: 1549–1557. pmid:17115042.
- 21. Rasch B, Büchel C, Gais S, Born J. Odor cues during slow-wave sleep prompt declarative memory consolidation. Science. 2007;315(5817): 1426–1429. pmid:17347444.
- 22. Miyamoto D, Hirai D, Murayama M. The roles of cortical slow waves in synaptic plasticity and memory consolidation. Front Neural Circuits. 2017;Volume 11: 92. pmid:29213231.
- 23. Calabresi P, Picconi B, Tozzi A, Filippo MD. Dopamine-mediated regulation of corticostriatal synaptic plasticity. Trends Neurosci. 2007;30(5): 211–219. pmid:17367873.
- 24. Frémaux N, Gerstner W. Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules. Front Neural Circuits. 2016. 9(85): 1–19. pmid:26834568.
- 25. Kuśmierz Ł, Isomura T, Toyoizumi T. Learning with three factors: modulating Hebbian plasticity with errors. Curr Opin Neurobiol. 2017;46: 170–177. pmid:28918313.
- 26. Izhikevich EM. Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex. 2007;17(10): 2443–2452. pmid:17220510.
- 27. Klampfl S, Maass W. Emergence of dynamic memory traces in cortical microcircuit models through STDP. J Neurosci. 2013;33 (28): 11515–11529. pmid:23843522.
- 28. Orsborn AL, Pesaran B. Parsing learning in networks using brain–machine interfaces. Curr Opin Neurobiol. 2017;46: 76–83. pmid:28843838.
- 29. Bassett DS, Bullmore ET. Small-world brain networks revisited. Neuroscientist. 2017;23(5): 499–516. pmid:27655008.
- 30. Zhang H, Watrous AJ, Patel A, Jacobs J. Theta and alpha oscillations are traveling waves in the human neocortex. Neuron. 2018;98(6): 1269–1281.e4. pmid:29887341.
- 31. Zhang H, Jacobs J. Traveling theta waves in the human hippocampus. J Neurosci. 2015;35(36): 12477–12487. pmid:26354915.
- 32. Dan Y, Poo MM. Spike timing-dependent plasticity: from synapse to perception. Physiol Rev. 2006;86: 1033–1048. pmid:16816145.
- 33. Floresco SB, West AR, Ash B, Moore H, Grace AA. Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission. Nat Neurosci. 2003;6: 968–973. pmid:12897785.
- 34. Li S, Cullen WK, Anwyl R, Rowan MJ. Dopamine-dependent facilitation of LTP induction in hippocampal CA1 by exposure to spatial novelty. Nat Neurosci. 2003;6: 526–531. pmid:12704392.
- 35. Bi GQ, Poo MM. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neurosci. 1998;18(24): 10464–10472. pmid:9852584.
- 36. Harris KD, Bartho P, Chadderton P, Curto C, Rocha J, Hollender L, et al. How do neurons work together? Lessons from auditory cortex. Hear Res. 2010;271(1–2): 37–53. pmid:20603208.
- 37. Matsumoto K, Suzuki W, Tanaka K. Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science. 2003;301(5630): 229–232. pmid:12855813.
- 38. Yang J, Yang W, Wu W. A novel spiking perceptron that can solve XOR problem. Neural Network World. 2011;1(11): 45–50.
- 39. Alexandre F, Guyot F, Haton JP, Burnod Y. The cortical column: A new processing unit for multilayered networks. Neural Netw. 1991;4(1): 15–25.
- 40. Lefort S, Tomm C, Floyd JC, Petersen CC. The excitatory neuronal network of the C2 barrel column in mouse primary somatosensory cortex. Neuron. 2009;61(2): 301–316. pmid:19186171.
- 41. Beeler JA, Daw N, Frazier CRM, Zhuang X. Tonic dopamine modulates exploitation of reward learning. Front Behav Neurosci. 2010;Volume 4: 170. pmid:21120145.
- 42. Westerberg JA, Cox MA, Dougherty K, Maier A. V1 microcircuit dynamics: altered signal propagation suggests intracortical origins for adaptation in response to visual repetition. J Neurophysiol. 2019;121: 1938–1952. pmid:30917065.
- 43. Schultz W. Behavioral dopamine signals. Trends Neurosci. 2007;30(5): 203–210. pmid:17400301.
- 44. Monti JM, Monti D. The involvement of dopamine in the modulation of sleep and waking. Sleep Med Rev. 2007;11(2): 113–133. pmid:17275369.
- 45. Muller L, Chavane F, Reynolds J, Sejnowski TJ. Cortical travelling waves: mechanisms and computational principles. Nat Rev Neurosci. 2018;19: 255–268. pmid:29563572.
- 46. Tahvildari B, Wolfel M, Duque A, McCormick DA. Selective functional interactions between excitatory and inhibitory cortical neurons and differential contribution to persistent activity of the slow oscillation. J Neurosci. 2012;32(35): 12165–12179. pmid:22933799.
- 47. Stimberg M, Brette R, Goodman DFM. Brian 2, an intuitive and efficient neural simulator. eLife. 2019;Volume 8. pmid:31429824.
- 48. Feenstra MGP, Botterblom MHA, Uum JFMV. Novelty-induced increase in dopamine release in the rat prefrontal cortex in vivo: inhibition by diazepam. Neurosci Lett. 1995;189(2): 81–84. pmid:7609924.
- 49. Tan AYY, Chen Y, Scholl B, Seidemann E, Priebe NJ. Sensory stimulation shifts visual cortex from synchronous to asynchronous states. Nature. 2014;509: 226–229. pmid:24695217.