^{*}

Conceived and designed the experiments: CS JT. Performed the experiments: CS. Analyzed the data: CS. Wrote the paper: CS JT. Developed the mathematical model used for implementing intrinsic plasticity in the paper: PJ.

The authors have declared that no competing interests exist.

Although models based on independent component analysis (ICA) have been successful in explaining various properties of sensory coding in the cortex, it remains unclear how networks of spiking neurons using realistic plasticity rules can realize such computation. Here, we propose a biologically plausible mechanism for ICA-like learning with spiking neurons. Our model combines spike-timing dependent plasticity and synaptic scaling with an intrinsic plasticity rule that regulates neuronal excitability to maximize information transmission. We show that a stochastically spiking neuron learns one independent component for inputs encoded either as rates or using spike-spike correlations. Furthermore, different independent components can be recovered, when the activity of different neurons is decorrelated by adaptive lateral inhibition.

How the brain learns to encode and represent sensory information has been a longstanding question in neuroscience. Computational theories predict that sensory neurons should reduce redundancies between their responses to a given stimulus set in order to maximize the amount of information they can encode. Specifically, a powerful set of learning algorithms called Independent Component Analysis (ICA) and related models, such as sparse coding, have emerged as a standard for learning efficient codes for sensory information. These algorithms have been able to successfully explain several aspects of sensory representations in the brain, such as the shape of receptive fields of neurons in primary visual cortex. Unfortunately, it remains unclear how networks of spiking neurons can implement this function and, even more difficult, how they can learn to do so using known forms of neuronal plasticity. This paper solves this problem by presenting a model of a network of spiking neurons that performs ICA-like learning in a biologically plausible fashion, by combining three different forms of neuronal plasticity. We demonstrate the model's effectiveness on several standard sensory learning problems. Our results highlight the importance of studying the interaction of different forms of neuronal plasticity for understanding learning processes in the brain.

Independent component analysis is a well-known signal processing technique for extracting statistically independent components from high-dimensional data. For the brain, ICA-like processing could play an essential role in building efficient representations of sensory data

Classic ICA algorithms often exploit the non-Gaussianity principle, which allows the ICA model to be estimated by maximizing some non-Gaussianity measure, such as kurtosis or negentropy

Interestingly, certain homeostatic mechanisms are thought to regulate the distribution of firing rates of a neuron

From a computational perspective, it is believed that IP may maximize information transmission of a neuron, under certain metabolic constraints

We show that IP and synaptic scaling complement STDP learning, allowing single spiking neurons to learn useful representations of their inputs for several ICA problems. First, we show that output sparsification by IP together with synaptic learning is sufficient for demixing two zero mean supergaussian sources, a classic formulation of ICA. When using biologically plausible inputs and STDP, complex tasks, such as Foldiák's bars problem

The underlying assumption behind our approach, implicit in all standard models of V1 receptive field development, is that both input and output information are encoded in rates. In this light, one may think of our current work as a translation of the model in

A schematic view of the learning rules is shown in

Synapse weights

To illustrate the basic mechanism behind our approach, we first ask if enforcing a sparse prior by IP and Hebbian learning can yield a valid ICA implementation for the classic problem of demixing two supergaussian independent sources. In the standard form of this problem, zero mean, unit variance inputs ensure that the covariance matrix is the identity, such that simple Hebbian learning with a linear unit (equivalent to principal component analysis) would not be able to exploit the input statistics and would just perform a random walk in the input space. This is, however, a purely mathematical formulation, and does not make much sense in the context of biological neurons. Inputs to real neurons are bounded and —in a rate-based encoding— all positive. Nonetheless, we chose this standard formulation to illustrate the basic underlying principles behind our model. Below, we will consider different spike-based encodings of the input and learning with STDP.

As a special case of a demixing problem, we use two independent Laplacian distributed inputs, with unit variance:

In

(A) Evolution of the weights (

After showing that combining IP and synaptic learning can solve a classical formulation of ICA, we focus on spike-based, biologically plausible inputs. In the following, STDP is used for implementing synaptic learning, while the IP and the synaptic scaling implementations remain the same.

The demixing problem above can be solved also in a spike-based setting, after a few changes. First, the positive and negative inputs have to be separated into on- and off- channels (

The evolution of the weights is shown in

(A) Evolution of the weights for a rotation angle

As a second test case for our model, we consider Foldiák's well-known bars problem

(A) A set of randomly generated samples from the input distribution, (B) Evolution of the neuron's receptive field as the IP rule converges and instantaneous firing rate of the neuron. Each dot corresponds to the instantaneous firing rate (

In our implementation, the input vector is normalized and the value of each pixel

The input normalization makes a bar in an input sample containing a single IC stronger than a bar in a sample containing multiple ICs. This may suggest that a single component emerges by preferentially learning ‘easy’ examples, i.e. examples with a single bar. However, this is not the case and one bar is learned even when single bars never appear in the input, as in

In the previous experiments, input information was encoded as firing rates. In the following, we show that this stimulus encoding is not critical and the presence of spike-spike correlations is sufficient to learn an independent component, even in the absence of any presynaptic rate modulation. To demonstrate this, we consider a slight modification of the above bars problem, with input samples consisting of always two bars, each two pixel wide, with N distinct bars. This is a slightly more difficult task, as wide bars emphasize non-linearities due to overlap, but can be solved by our model with a rate encoding (see

In this case, all inputs have the same mean firing rate and the information about whether a pixel belongs to a bar or not is encoded in correlation patterns between different inputs (see

(A) Example of 20 spike trains with

As for the rate-based encoding, the neuron is able to reliably learn a bar (

Due to its stochastic nature, our neuron model is not particularly sensitive to input correlations, hence

The third classical ICA problem we consider is the development of V1-like receptive fields for natural scenes

We use a set of images from the van Hateren database

As shown in

(A) Evolution of the neuronal activity during learning, (B) Learned weights corresponding to the inputs from the on and off populations, (C) The receptive field learned by the neuron, and its l.m.s. Gabor fit.

So far, learning has been restricted to a single neuron. For learning multiple independent components, we implement a neuron population in which the activities of different neurons are decorrelated by adaptive lateral inhibition. This approach is standardly used for feature extraction methods based on single-unit contrast functions

(A) Overview of the network structure: all neurons receive signals from the input layer and are recurrently connected by all-to-all inhibitory synapses, (B) A set of receptive fields learned for the bars problem, (C) Evolution of the mean correlation coefficient and mutual information in time, both computed by dividing the neuron output in bins of width 1000 s and estimating

We consider a population of 10 neurons. In order to have a full basis set for the bars problem, we use 2 pixel wide bars. For this case, our learning procedure is able to recover the original basis (

Due to the adaptive nature of IP, the balance between excitation and inhibition does not need to be tightly controlled, allowing for robustness to changes in parameters. However, the inhibition strength influences the time required for convergence (the stronger the inhibition, the longer it takes for the system to reach a stable state). A more important constraint is that the adaptation of inhibitory connections needs to be faster than that of feedforward connections to allow for efficient decorrelation (see

We wondered what the role of IP is in this learning procedure. Does IP simply find an optimal nonlinearity for the neuron's transfer function, given the input, something that could be computed offline (as for InfoMax

Evolution of the receptive field for a neuron with a fixed gain function, given by the final parameters obtained after learning in the previous bars experiment. A bar cannot be learned in this case.

We could assume that the neuron failed to learn a bar for the fixed transfer function just because the postsynaptic firing was too low, slowing down learning. Hence, it may be that a simpler rule, regulating just the mean firing rate of the neuron, would suffice to learn an IC. To test this hypothesis, we construct an alternative IP rule, which adjusts just

(A) Evolution of neuron activation for a neuron with a gain function regulated by a simplified IP rule, which adjusts

A good starting point for elucidating the mechanism by which the interaction between STDP and IP facilitates the discovery of an independent component is our initial problem of a single unit receiving a two dimensional input. We have previously shown in simulations that for a bounded, whitened, two dimensional input the weight vector tends to rotate towards the heavy-tailed direction in the input

We report here only the main results of these experiments, while a detailed description is provided as supplemental information (see

One way to understand these results could be in terms of nonlinear PCA theory. Given that for a random initial weight vector, the total input distribution is close to Gaussian, in order to enforce a sparse output, the IP has to change the transfer function in a way that ‘hides’ most of the input distribution (for example by shifting

Lastly, from an information-theoretic perspective, our approach can be linked to previous work on maximizing information transmission between neuronal input and output by optimizing synaptic learning

Although ICA and related sparse coding models have been very successful in describing sensory coding in the cortex, it has been unclear how such computations can be realized in networks of spiking neurons in a biologically plausible fashion. We have presented a network of stochastically spiking neurons that performs ICA-like learning by combining different forms of plasticity. Although this is not the only attempt at computing ICA with spiking neurons, in previous models synaptic changes were not local, depending on the activity of neighboring neurons within a population

In our model, IP, STDP and synaptic scaling interact to give rise to robust receptive field development. This effect does not depend on a particular implementation of STDP, but it does require an IP mechanism which enforces a sparse output distribution. Although there are very good theoretical arguments why this should be the case

From a computational perspective, our approach is reminiscent of several by-now classic ICA algorithms. As mentioned before, IP enforces the output distribution to be heavy-tailed, like in sparse coding

It is interesting to think of the mechanism presented here in relation to projection pursuit

From a different perspective, we can relate our method to nonlinear PCA. It is known that, for zero mean whitened data, nonlinear Hebbian learning in a rate neuron can successfully capture higher order correlations in the input

Our previous work was restricted to the a rate model neuron

We consider a stochastically spiking neuron with refractoriness

The refractory state of the neuron, with values in the interval

The probability

Our intrinsic plasticity model attempts to maximize the mutual information between input and output, for a fixed energy budget

Computing the gradient of

Additionally, as a control, we have considered a simplified rule, which adjusts a single transfer function parameter in order to maintain the mean firing rate of the neuron to a constant value

The STDP rule implemented here considers only nearest-neighbor interactions between spikes

This STDP implementation is particularly interesting as it can be shown that, under the assumption of uncorrelated or weakly correlated pre- and post-synaptic Poisson spike trains, it induces weight changes similar to a BCM rule, namely

For the above expression, the fixed BCM threshold can be computed as:

In some experiments we also consider the classical case of additive all-to-all STDP

As in approaches which directly maximize kurtosis or similar measures

In a neural population, the same normalization is applied for the lateral inhibitory connections. As before, weights do not change sign and are constrained by the

Currently, the normalization is achieved by dividing each weight by

In all experiments, excitatory weights were initialized at random from the uniform distribution and normalized as described before. The transfer function was initialized to the parameters in

For the experiments involving the rate-based model and a two-dimensional input, each sample was presented for one time step and the learning rates for IP and Hebbian learning were

For the demixing problem, the input was generated as described for the rate-based scenario above. After the rectification, the firing rates of the input on the on- and off- channels were scaled by a factor of 20, to speed up learning. After convergence, the total weight of each channel was estimated as the sum of individual weights corresponding to that input. The resulting four-dimensional weight vector was projected back to the original two-dimensional input space using:

For all variants of the bars problem, the input vector was normalized to

When using the correlation-based encoding, all inputs had the same mean firing rate (

When learning Gabor receptive fields, images from the van Hateren database

For a neuronal population, input-related parameters were as for the single component, but with

Receptive field estimation for the spike-based demixing problem

(0.04 MB PDF)

Parameter dependence for learning one IC

(0.07 MB PDF)

Learning with a two-dimensional input

(0.53 MB PDF)

A link to BCM

(0.15 MB PDF)

We would like to thank Jörg Lücke and Claudia Clopath and for fruitful discussions. Additionally, we thank Jörg Lücke for providing the code for the Gabor l.m.s. fit and to Felipe Gerhard for his help with the natural images preprocessing.