Fast adaptation to rule switching using neuronal surprise

In humans and animals, surprise is a physiological reaction to an unexpected event, but how surprise can be linked to plausible models of neuronal activity is an open problem. We propose a self-supervised spiking neural network model where a surprise signal is extracted from an increase in neural activity after an imbalance of excitation and inhibition. The surprise signal modulates synaptic plasticity via a three-factor learning rule which increases plasticity at moments of surprise. The surprise signal remains small when transitions between sensory events follow a previously learned rule but increases immediately after rule switching. In a spiking network with several modules, previously learned rules are protected against overwriting, as long as the number of modules is larger than the total number of rules—making a step towards solving the stability-plasticity dilemma in neuroscience. Our model relates the subjective notion of surprise to specific predictions on the circuit level.

between the number of context modules used in simulation run i, N i used , and the number N i observed of observed rules.
3. Complete success rate: Fraction of simulation runs (out of 100) that have detected all switch points in less than 50 steps and that fulfill N i used = N i observed ; given as percentage (Table A and B) or probability p (Graph X1).
4. Detection time: mean number of presentation steps between a switch to a different rule in the input sequence and a switch to a different memory module in the model network.
Note that a simulation where a network uses more memory modules than the objective number rules does not necessarily imply a complete failure.Indeed, one can conceive that an observer splits a single rule in two different memories due to the unsupervised nature of the task.This is why we consider in the results section the detection success rate as the most valuable measure (it is not so important how many memories are used as long as one can detect the switch points).
For sequences that switch rules on average every 2000 time steps ( 1 H =2000) the switch detection success rate is 100 percent for transition rules with both K = 2 and 4 transitions, but it is slightly lower for simulation paradigms with more frequent context switches (Tables A and B).We expected that for K = 8 possible next stimuli, the performance breaks down since it implies that from each stimulus 8 out of the 15 possible next stimuli are allowed so that a large fraction of transitions are compatible with several transition rules.This was indeed the case (results not shown).However, for K = 2 the first or second transition after a switch between rules is with high probability a good indicator of the switch.
-Indeed, for K = 2 the SpikeSuM-C model detects switches within less than three presentation steps if switches occur on average every 1000 or 2000 time steps (Table A).

SpikeSum-C in the presence of biased transition rules
In all simulations with K ≥ 2 potential next stimuli, we assumed that the transition rule is unbiased so that all potential next stimuli occur with the same probability p = 1/K.It is also conceivable to have biased transitions.For example, with K = 2 potential next stimuli, one of the two could be preferred (transition probability 1 − ϵ) and the other unlikely (probability ϵ).
We tested the performance for K = 2 using the previously introduced notion of 'complete success rate'.We find that in an appropriate range of the hyperparameter a 2 performance is good, even if the transition bias is strong (see below Graph X1).A bias parameter ϵ = 0.1 indicates a strongly biased transition whereas a bias parameter ϵ = 0.5 corresponds to the unbiased transitions as in the main text.Whatever the choice of ϵ, a choice of the hyperparameter a 2 in the range 0.1 < a 2 < 0.4 gives a complete success rate above 80 percent.Note that with ϵ = 0.01 the transition is nearly deterministic and the 'unusual' transition occurs on average only once in 100 transitions.

SpikeSuM-C with more rules than modules
For SpikeSuM-C we assumed in the main text that the number of modules is larger than the number of rules.If there are more rules than modules we could imagine that the specific module which is most similar to the new additional rule is the one that is overwritten; or, alternatively, that for each new transition the network picks the module that has a transitions most similar to the current one so that eventually all previous modules are equally overwritten.Simulations show that the network dynamics tends to implement the second option.Indeed, adding a third rule (in a network with two modules) or a fourth rule (in a network with three modules) leads to many rapid context switches (see below Graph X2).The result is that all modules are overwritten and the network with two or three modules is not better than the network with a single module.Hence, for continual learning, it is important that the number of modules is at least as large as the maximum number of rules that are expected in an environment.
( modules copes with the task whereas in smaller networks all modules are equally overwritten and the performance is not better than with a single module (red)).The main text describes the behavior with 1 module and 5 modules.The case of 5 modules for four rules is equivalent to that of 4 modules for four rules.
Biased vs. non-biased transition rules.Complete success rate (probability p) as a function of the hyper-parameter a 2 .SpikeSuM-C with 5 modules, simulated for 1000 steps with change points after 500 steps and K = 2.For the 'biased' rules one transition is more likely P(r → R 1 ) = 1 − ϵ than the other one P(r → R 2 ) = ϵ with ϵ ranging from 0.01 (strong bias) to 0.5 (unbiased).For ϵ = 0.01, the strong bias challenges the model since our network is highly confident about its predictions based on the common transition and nevertheless rare transitions should not be considered as switch of rules.The results show that on average biased transitions are a bit harder to learn but in the parameter range a 2 ≈ 0.2, the network is still capable of detecting change points accurately whatever the choice of ϵ.Probability p estimated over N = 100 simulation runs.Observed fluctuations stay within two standard deviations estimated as p (1 − p)/N.
… (2 … (3... (4 … … … … … … … … … … … … Graph X2.More rules than modules.SpikeSuM-C networks with 1 to 4 modules exposed to four rules.Rules are added at different moment in times (vertical black arrows labeled with 1 to 4).With 1 module (red), every new rule leads to overwriting, but with 2 (green) or 3 modules (orange) the network switches between active modules several times when more rules are presented than modules available.When the fourth rule appears at around 2500 presentation steps (arrow marked 4) only the network with four

Table A :
Results for transition rules with K = 2 possible next stimuli Detection success rate module usage success rate Complete success rate Detection time

Table B :
Results for transition rules with K = 4 possible next stimuli Table A and B: Capacity of SpikeSum-C to detect change-points and re-use known models.Each task (10 000 steps) consists of 4 different rules that return several times (volatility parameter H characterizes rule switching probability).The intrinsic stochasticity of each rule is characterized by the stochasticity parameter K=2 (table A) and K=4 (table B) where K is the number of possible next states that can be reached from the current state.