^{*}

^{¤}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: MCWvR ABB. Performed the experiments: MS MCWvR. Wrote the paper: ABB MCWvR MS.

Current address: School of Informatics, University of Sussex, Brighton, United Kingdom.

Accurate models of synaptic plasticity are essential to understand the adaptive properties of the nervous system and for realistic models of learning and memory. Experiments have shown that synaptic plasticity depends not only on pre- and post-synaptic activity patterns, but also on the strength of the connection itself. Namely, weaker synapses are more easily strengthened than already strong ones. This so called soft-bound plasticity automatically constrains the synaptic strengths. It is known that this has important consequences for the dynamics of plasticity and the synaptic weight distribution, but its impact on information storage is unknown. In this modeling study we introduce an information theoretic framework to analyse memory storage in an online learning setting. We show that soft-bound plasticity increases a variety of performance criteria by about 18% over hard-bound plasticity, and likely maximizes the storage capacity of synapses.

It is generally believed that our memories are stored in the synaptic connections between neurons. Numerous experimental studies have therefore examined when and how the synaptic connections change. In parallel, many computational studies have examined the properties of memory and synaptic plasticity, aiming to better understand human memory and allow for neural network models of the brain. However, the plasticity rules used in most studies are highly simplified and do not take into account the rich behaviour found in experiments. For instance, it has been observed in experiments that it is hard to make strong synapses even stronger. Here we show that this saturation of plasticity enhances the number of memories that can be stored and introduce a general framework to calculate information storage in online learning paradigms.

Long term synaptic plasticity has been established as one of the most important components for learning and memory. In parallel with experimental investigations, numerous computational models of synaptic plasticity have been developed to simulate network function and to establish the fundamental characteristics and limitations of plasticity. Despite the complexity of the underlying neurobiology, theoretical studies have in the interest of tractability mostly focused on highly simplified plasticity rules

One such experimental finding is that strong synapses are harder to potentiate than weak ones, that is, the percentage increase in strength is significantly smaller for strong synapses than for weak synapses

Soft-bound plasticity automatically constrains the synaptic weights, and thereby resolves simply, but effectively, the danger of unconstrained plasticity, namely that on repeated activation, synaptic strength would grow indefinitely. In many modeling studies weight dependence is ignored, instead hard-bounds are typically introduced that cap the minimal and maximal synaptic weights (also known as weight clipping), which are often supplemented with constraints on the total weight

Despite the experimental evidence for soft-bound plasticity rules, the effect of weight dependence on information storage is not well understood

To understand how different plasticity rules determine information capacity, we consider a simple, single neuron learning paradigm. The setup is shown in

A) A neuron receives binary pattern inputs. At each time-step a new pattern is presented and the weights are updated according to the input value. The neuron's output equals the weighted sum of the inputs. B) The neuron has to remember the presented patterns. When tested, learned patterns lead to a larger output (solid curve) than lures (dashed curve). As the memory of the pattern ages and is overwritten by new patterns, the output of the neuron in response to the pattern becomes less distinct and the signal-to-noise ratio decays. The performance is measured by the signal-to-noise ratio, a measure of the distance between the two output distributions. C) The decay of the signal-to-noise ratio for soft-bound and hard-bound plasticity rules as a function of the age of the pattern. The synaptic updates were set so that both rules led to an initial SNR of 100 right after the pattern was presented (

The task of the neuron is to recognize the patterns that it has encountered previously. To measure the performance we periodically interrupt the learning and test the neuron with both previously presented patterns (labeled

The model is agnostic about the precise timescale and brain area involved - cortex and hippocampus come to mind. It is also possible to adopt a variant in which only a fraction of all presented patterns is learned. For instance, the synaptic plasticity might only occur with a certain probability, or plasticity might occur only if some additional signal (for instance signalling reward or relevance), lifts the postsynaptic activity above a certain plasticity threshold. Either mechanism would slow down the learning and forgetting equally, but would not otherwise change our analysis

Because the neuron sums many inputs, the output distribution is well approximated with a Gaussian distribution. This holds independently of the

In many storage capacity studies weights are initialized to zero, the number of items to be learned is fixed and learning stops after all the items have been presented, or once the task as been learned

The central question we address is which plasticity rules lead to the best performance. We focus on the effect of the weight dependence and restrict ourselves to plasticity rules that are local (only dependent on the pre- and post-synaptic activity at that synapse) and incremental (have no access to the patterns presented earlier). We implemented first two plasticity rules, a hard-bound and a soft-bound one.

For the hard-bound plasticity rule, potentiation occurs when the input

Secondly, we implement a soft-bound plasticity rule. Here the absolute amount of potentiation is weight independent, while the depression is proportional to the weight,

The decay of the SNR for both hard- and soft-bound plasticity rules is shown in

Ideally one has a slow forgetting and a strong initial memory, however this is impossible to achieve. High plasticity rates (large

The first way to resolve the trade-off uses mutual information. The mutual information expresses how many bits of information are gained about the novelty of a pattern by inspecting the output of the neuron when it is tested by lures and learned patterns. We pass the output of the neuron through a threshold with value

Using the Gaussian distribution of

Top: The SNR decay curves versus pattern age for soft-bound plasticity with a large synaptic update (thin curve), and soft-bound plasticity with a small update (thick curve). Although the rules trade off between slow decay and the high initial SNR differently, the area under the curve is identical. Middle: The relation between SNR and Information,

The saturation has an important consequence when one wants to maximize information, illustrated in

This setup in principle requires a threshold precisely between the average output to pattern and lure, i.e.

Both soft-bound information capacity

A) The information capacity per synapse in the recognition task for a variety of plasticity rules. Up to numerical error, the soft-bound, log-normal and Gutig rule perform identically. B) Simulation of the lifetime of a memory for various plasticity rules. The lifetime was defined as the number of memories stored with a SNR above 30. Again soft-bound plasticity outperforms hard-bounds. C) The trade-off between memory decay time and initial memory strength for soft- and hard-bound plasticity. The amount of synaptic update was varied and the resulting fitted decay time-constant and the initial SNR was plotted. Ideally initial strength is high and memory decay time is long (top-right corner), but increasing one decreases the other. Soft-bound plasticity always leads to a superior trade-off.

We simulated two additional soft-bound plasticity rules. The first comes from the empirical observation that the synaptic weight distribution can be fitted to a log-normal distribution

The second rule is a polynomial plasticity rule

Both the log-normal and polynomial rule perform as well as the original soft-bound rule, but neither did better,

Although the use of small synaptic updates maximizes information capacity, in practice there are issues with using small updates. First, the low SNR of each memory renders the pattern/lure discrimination sensitive to noise, such as noise from synaptic variability or other inputs to the neuron. Secondly, in recurrent networks, such as the Hopfield network, errors made by a single neuron can be amplified by the network. Moreover, experimental evidence suggests that synaptic plasticity protocols can induce substantial changes in a synapse. Finally, soft-bound plasticity with small updates leads to narrow weight distributions (i.e. with a small variance), while the observed synaptic weight distributions are relatively broad, consistent with larger updates.

We therefore define a second storage measure to compare hard and soft-bounds. We use the same single neuron paradigm used above but define the memory lifetime as the number of recent patterns that the neuron stores with SNR above a predefined threshold

Because the SNR decays exponentially with soft-bound plasticity (Models),

The lifetime in the hard-bound case can be approximated by taking only the lowest order term in the expression for the SNR decay (see Models), yielding

The theory is confirmed by the simulations in which we numerically maximized the lifetime by changing the synaptic update. Too small updates would lead to none or few patterns above the threshold, while too large updates speed up the decay. The lifetime is plotted normalized by the number of synapses. The soft-bound plasticity, as well as the other soft-bound variants, outperform hard-bound,

As indicated, the SNR decay leads to a trade-off between initial SNR and memory decay time. The current performance measure sets a somewhat arbitrary threshold to resolve this. However, the result is independent of the precise threshold setting.

In the simulations and the analysis we made two assumptions. First, the input patterns take positive and negative values, but in biological systems it would seem more natural to assume that the inputs are spikes, that is, 1s and 0s. Secondly, we tacitly used fixed feedforward inhibition to balance the excitatory inputs and ensure that the effective synaptic weight has zero mean. Without these assumptions the SNR, and hence the information capacity, for both hard- and soft-bound plasticity decreases significantly.

The reason for the decreased SNR is easiest seen by analyzing the variance in the output in response to a lure pattern. Because a lure pattern will be completely uncorrelated to the weights, the variance can be written as

In the above results the second term in

A. The synaptic information capacity versus the coding density for soft-bound (solid line) and hard-bound (dashed line) plasticity, when taking 0s and 1s as inputs. When coding density is 1/2, the capacity is approximately half of what it is when using

The third term in

In this idealized setup, feed-forward inhibition is mathematically equivalent to making the mean weight zero by allowing negative weights and adjusting the plasticity rules. For instance, for the hard-bound plasticity, this is achieved by just moving the lower weights bound to −1; for soft-bound plasticity, the depression rule is redefined to

A further potential problem with hard-bound plasticity is that highest capacity is only attained when potentiation and depression are exactly balanced

A. The Information capacity showing the theory (black) as well as simulations for 10, 100, 1000, and 10000 synapses for

The width of the peak around

We have studied plasticity rules that include the experimental observation that plasticity depends on the synaptic weight itself. Namely, the relative amount of potentiation decreases as the synapse gets stronger, while the relative amount of depression shows no such dependence. This means that the synaptic weight is automatically bounded. Using an information theoretic framework we have compared these plasticity rules to hard-bound plasticity in an online learning setting. We found that the information storage is higher for soft-bound plasticity rules. Contrasting the prototypical soft-bound and hard-bound plasticity rules, the improvement can be calculated analytically. In addition, a wide class of soft-bound plasticity rules lead to the same increased capacity and we suggest that soft-bound rules in fact maximize the synaptic capacity. Furthermore, we examined an alternative capacity measure that determines how many patterns can be stored above a certain threshold. This memory lifetime measure also improved for soft-bound plasticity.

The improvement in performance is moderate for both capacity measures, some 18%. However, it should be stressed that

Given that the difference between hard and soft-bound is only quantitative, it is difficult to point to the cause for the difference, besides our mathematical derivation. Our intuition is that the hard-bounds lead to a deformation of the weight distribution. As a result, the decay back to equilibrium is a sum of exponentials. This means that the decay is always somewhat faster than for the soft-bound case for identical initial SNR,

Our study reveals the following criteria to optimize synaptic storage capacity: 1) use soft-bound rules, 2) ensure that the mean input is close to zero, for instance by using sparse codes, 3) ensure that the mean weight is close to zero, for instance by implementing balanced feed-forward inhibition, and finally, if one wants to maximize the mutual information, 4) update the synapse in small amounts.

There have recently been a number of studies on information storage and forgetting in synapses with a discrete number of states. The continuous synapses studied here have an equal or superior capacity, as they can always effectively act as discrete ones. An earlier study of discrete synapses argued that balanced hard-bound rules are superior to soft-bound rules when long memory lifetime is required

The results do not depend on the precise form of the soft-bound plasticity rules. Biophysically, numerous mechanisms with any kind of saturation, could lead to soft-bound plasticity. At the level of a single synapse one could even argue that weight-

Apart from the increased capacity, weight dependence has other important consequences for plasticity. First, the weight dependence leads to central weight distributions, consistent with data, both measured electro-physiologically and microscopically. Second, competition between synapses is weaker for soft-bound rules because depressed synapses never completely disappear. Finally, in soft-bound plasticity the mean weight remains sensitive to correlations

In an earlier study we reported that soft-bound STDP plasticity leads to shorter retention times than hard-bound plasticity in both single neurons and networks

Soft-bound plasticity is certainly not the only way to prevent run-away plasticity. Apart from hard-bounds, BCM theory

In this section we calculate the capacity of soft and hard-bound plasticity analytically. To calculate the capacity we concentrate on one single synapse, as the Signal-to-Noise scales linearly with the number of synapses. We artificially distinguish the pattern that is to be learned from all the other patterns that are presented subsequently and erase the memory of this pattern, although of course, no pattern stands above the others; the same plasticity rules underlie both the storage and the forgetting processes. Throughout we assume that the synaptic updates are small. This prevents saturation in the relation between SNR and Information,

To calculate the information storage we need to study how the synaptic weight decays after learning. As the weight updates are small, a Fokker-Planck equation for the weight distribution describes the decay of the synapse as it is subject to the learning of other patterns. The synaptic weight distribution

We first calculate the information capacity for the soft-bound rule. The solution Fokker-Planck is complicated, but when the jumps are much smaller than the standard deviation of the distribution, that is,

We consider what happens to a given synapse when a pattern is learned. Suppose that at time zero, the synapse gets a high input and is thus potentiated (the calculation in the case of depression is analogous). Right after the potentiation event, the weight distribution is displaced with an amount

After this event, the synaptic weight is subject to random potentiation and depression events caused by the learning of patterns presented subsequently. During this the perturbation of the synaptic weight distribution will decay, until finally the distribution equals the equilibrium again and the potentiation event will have lost its trace. The perturbed distribution can be plugged into the Fokker-Planck equation to study its decay back to equilibrium. Because

The distribution right after the potentiation is shown by the magenta curve; as time progresses (indicated by the arrow) it decays back to the equilibrium distribution (thick black curve). A) With soft-bound plasticity the distribution is displaced but maintains its shape. During the overwriting it shifts back to the equilibrium distribution. B) With hard-bound plasticity, the distribution distorts after the potentiation due to the presence of the bounds. As it decays back to the equilibrium this distortion flattens out.

The output signal is found by probing the synapse with a ‘1’ (high input). Assuming perfectly tuned feed-forward inhibition, the mean output signal is

In the simulations we found the same information for other soft-bound plasticity rules. For small updates, a Taylor expansion of the drift

We repeat the calculation for hard-bound plasticity. We impose hard-bounds at

At time

As above, the synaptic weight is subject to random potentiation and depression events caused by the learning of patterns presented subsequently. Solving the Fokker-Planck equation gives that the perturbation of the synaptic weight distribution decays as

The derivative of the mean weight follows from the Fokker-Planck equation after integration by parts as

The variance in the output equals that of the uniform distribution and is

For the soft-bound plasticity, the ratio between potentiation and depression determines the mean synaptic weight, but the capacity does not depend on it. For hard-bound plasticity, however, the situation is more complicated. Imbalance between potentiation and depression leads to a deformation of the steady state distribution and speeds up the decay after perturbations, changing the capacity.

We assume that the size of the potentiation event is

The computer code is made available on the first author's website. To examine the various plasticity rules, we presented typically one million patterns to the neuron, one pattern at every time-step. Every pattern led to an update of the synaptic weights according to the plasticity rules. After each time-step, the output of the neuron in response to this pattern, to its predecessors, as well as to lures was measured. The mean and the variance of the neuron's output were collected online as a function of the pattern's age, that is how many time-steps ago the pattern was presented. From this the SNR and information was calculated at the end of the simulation using

Importantly, unlike the analysis above, the simulations are not restricted to small synaptic updates. Furthermore, the Gaussian assumption can be dropped in the simulations. In that case all the neuron's responses were stored, and the information was calculated at the end of the simulation using the full response distributions to patterns and lures. We found results were the virtually identical. It required, however, much more computer memory and time than the first method.

Unless indicated otherwise, for the figures we used

The authors would like to thank Matthias Hennig and Cian O'Donnell for discussions.