Sensory coding and contrast invariance emerge from the control of plastic inhibition over emergent selectivity

Visual stimuli are represented by a highly efficient code in the primary visual cortex, but the development of this code is still unclear. Two distinct factors control coding efficiency: Representational efficiency, which is determined by neuronal tuning diversity, and metabolic efficiency, which is influenced by neuronal gain. How these determinants of coding efficiency are shaped during development, supported by excitatory and inhibitory plasticity, is only partially understood. We investigate a fully plastic spiking network of the primary visual cortex, building on phenomenological plasticity rules. Our results suggest that inhibitory plasticity is key to the emergence of tuning diversity and accurate input encoding. We show that inhibitory feedback (random and specific) increases the metabolic efficiency by implementing a gain control mechanism. Interestingly, this led to the spontaneous emergence of contrast-invariant tuning curves. Our findings highlight that (1) interneuron plasticity is key to the development of tuning diversity and (2) that efficient sensory representations are an emergent property of the resulting network.

1 Introduction contrast at all orientations (Fig. 8b). This results in increased OBW values for higher input contrast 235 (see also Fig. S9 for normalized spike counts). Interestingly, for the non-preferred orientation, the 236 average membrane potential the EI2/1 model is less hyperpolarized for lower contrast than for higher 237 contrast. For higher contrast, the average membrane potential increases at the preferred orientation 238 and is substantially stronger than for lower contrast. Both curves intersect around −50mV , close to the 239 resting state spiking threshold (−50.4mV ) (Fig. 8c). This can be explained with the average input 240 current: At higher contrast levels and non-preferred orientations, the feedback inhibitory current increases 241 more strongly than the excitatory current and nearly compensates it (Fig. 8e and S3 a), providing 242 hyperpolarization of the membrane potential. This compensation of excitation decreases around the 243 preferred stimulus, where the membrane potential exceeds the spiking threshold. In comparison, the 244 membrane potential for the EI3/1 model increases proportionally with the total input current caused by 245 higher input contrast (Fig. 8d, Fig. 8f and S3 b). This suggests that the contrast-invariant tuning of 246 the EI2/1 model depends on an appropriate balance between excitation and inhibition.

248
Based on the observation of contrast invariant tuning curves, we conclude that feedback inhibition 249 modulates the neuronal gain controlled by input orientation and contrast. Fig. 7b shows the average 250 response gain for the excitatory population, averaged across the whole population, and sorted by the 251 input current(see Neuronal gain curves for more details). We show the response gain curves for low 252 and high contrast stimuli. For the model with blocked inhibition (blockInh), the gain curve is unaffected 253 by contrast and follows the activation function defined by the neuron model. The firing rate to input 254 ratios of neurons in the EI2/1 model are strongly reduced relative to the blockInh model, but this gain 255 modulation is contrast-dependent, as the highest reduction of firing rates is observed for high contrast. 256 This shows that the effect of inhibition on the neuronal gain function not only depends on the amount of 257 excitatory input, but also on the stimulus orientation and contrast strength. 258 processing led to an increase in the selectivity to artificial stimuli, we asked whether inhibition contributed 260 to a sparser population code for natural images. We first compared the overall spiking behavior based 261 on raster plots of network responses to five example image patches, for the EI2/1 (Fig. 9a) and the 262 blockInh model (Fig. 9c). The model with active inhibition showed sparser firing and a less synchronous 263 spiking behavior than the model with blocked inhibition. Second, to quantify this effect, we measured the 264 population sparseness for all model configurations, based on the responses to 10.000 natural image patches 265 (Fig. 9b). The highest sparseness value (0.62) was observed in the EI2/1 model, 0.54 for the blockInh 266 model and the lowest sparseness value (0.43) in the noInh model. Interestingly, the development of a 267 higher diversity of receptive fields had a stronger influence on the population sparseness than inhibitory 268 processing: Sparseness values differed more strongly between the model configurations without inhibition, 269 the noInh and blockInh model, than between the EI2/1 and its blocked counterpart, which share the 270 same feed-forward receptive fields.  Input reconstruction benefits from plastic inhibition We assume that a diversity of receptive fields, 294 which encode the relevant input features, is crucial to provide an input representation without loss. To 295 measure the quality of the input representation and to compare our model with existing sparse coding 296 models, in terms of stimulus encoding, we calculated the image reconstruction error (IRE), which measures 297 the mean-square error between the input image and its reconstruction obtained by linear decoding (see Image reconstruction error). We plot the IRE as a function of the receptive field diversity, measured 299 by the orientation diversity index (ODI) as described previously (see Orientation diversity). The 300 EI2/1 model with active and plastic inhibition during learning showed the lowest reconstruction error 301 value (0.74), with a high ODI value (0.75) (Fig. 10) not influence the emergence of receptive field diversity nor the input encoding quality.

307
If the feed-forward input to the inhibitory population is random and fixed during learning (fix ff inh 308 model), the receptive fields of the excitatory population are less diverse, and the reconstruction error 309 increases (0.91). A fixed inhibitory connection to the excitatory population (fix fb inh model) leads to 310 a slightly higher reconstruction error (0.97) and a less diverse receptive field orientations (ODI of 0.33).

311
This demonstrates that the plasticity of both the inhibitory feedback connections and the excitatory feed-312 forward connections to the inhibitory population leads to a better input representation, as a consequence 313 of a higher receptive field diversity. Using fixed inhibitory feed-forward and feedback connections lead to 314 a similar result then having only fixed feedback inhibitory connections (see Fig. S1) 315 To verify that the influence of plastic inhibition is the cause for a more receptive field diversity, and homeostatic mechanism (as suggest in Pfister & Gerstner (2006)) to realize receptive field learning and 319 tested fixed feed-forward, fixed feedback and non-plastic inhibition in the same way as for our original 320 model. We observed the same reduction of orientation diversity with an increase in the IRE by fixed 321 feed-forward and/or feedback inhibition (see supp. Fig. S14). Together, these results indicate that the 322 diversity of receptive fields contributes to the average reconstruction accuracy. Further, after learning, 323 the effect of active inhibition on the encoding quality is negligible. This is important, as inhibition is 324 essential for receptive field diversity, but it may contribute to a loss of information if the neural code 325 becomes too sparse by the suppression of too many feature-coding neurons (Wiltschut & Hamker, 2009 measured the resulting image reconstruction error with white noise added on a natural scene and observe 331 a higher robustness against noise in models with plastic inhibition (see Fig. S15).  : Tuning diversity is improved by plastic feed-forward and feedback inhibition The orientation diversity index (ODI) is calculated via the Kullback-Leibler divergence between an idealized orientation distribution and the measured distribution in the network. The exponential of the divergence value is taken, so that higher values indicate a more uniform orientation distribution. Green arrows indicate plastic synapses, black arrows denote fixed synapses. Orange arrows indicate weaker synaptic connections. Dotted lines indicate that the model weights were initialized with shuffled values from weights of a previous run of the EI2/1 model. The highest diversity of RFs is observed for all models with fully plastic inhibition during learning (EI2/1, EI3/1, and blockInh models). Abolishing plasticity at feed-forward inputs to inhibitory neurons led to a moderate decrease of orientation diversity (f ixf f inhmodel). Blocking plasticity at inhibitory feedback synapses onto excitatory neurons led to a stronger decrease in orientation diversity (f ixf binhmodel). The lowest diversity was observed in the noInh model, were inhibition was fully absent.  Figure 4: Synaptic connections reflect tuning similarity. Weight matrices from the excitatory to the inhibitory population (and vice versa) (a), sorted by the receptive field orientation, and for the lateral inhibitory weights (b). a,Top: Weights from the excitatory to the inhibitory population. a, Bottom: Weights from the inhibitory to the excitatory population. For display, all weight matrices were normalized by the maximum value. All weights are from the EI2/1 model. (c) Normalized synaptic strength as a function of the template match between the preand postsynaptic neuron's receptive fields for the EI2/1 (first row) and the EI3/1 (second row) model. Shaded areas denote the mean +/-standard deviation. As expected, we observed strong weights between neurons with highly similar receptive fields, and near-zero weights between neurons with highly dissimilar receptive fields. For neurons with a moderate degree of RF similarity, we observed a steep transition from weak to strong weights at the E-I projection. At the I-E and I-I projections, this transition was more gradual. Response correlation is higher for neurons with more similar receptive fields. Blocking inhibition (yellow line) after learning reveals that inhibition leads to a overall decrease of the response correlation (green line). Smaller OBW values correspond to narrower tuning curves. As expected, the EI2/1 model (green) shows the narrowest tuning curves. The slightly reduced inibitory strength in the EI3/1 model (red) leads to moderately broader tuning curves. Fully blocking inhibition post-learning leads to both wider tuning curves and increased baseline activity in the blockInh model (yellow). The broadest tuning curves and highest baseline activity were observed in the noInh model (blue), which produced relatively large receptive fields.  model, (f ) and the EI3/1 model. In the EI3/1 model, high-contrast stimuli with non-preferred orientations are associated with very different dynamics than in the EI2/1 model: In the EI2/1 model, the sum of excitatory and inhibitory currents is near zero for non-preferred orientations at high contrast (e). In the EI3/1 model, the total synaptic current (f) remains large enough to elicit considerable membrane depolarization for non-preferred orientations at high contrast (d), reflected in elevated baseline activity and broader tuning (b).  , and the noInh model (dark blue dots). IRE is calculated as the mean-square error between input image and the reconstruction. A better reconstruction is represented by smaller values for the IRE and is associated with a higher orientation diversity (presented by higher ODI values). Data shown from 20 independent simulations per model configuration.

333
Our model suggests that a single underlying mechanism -the interaction of excitatory and inhibitory 334 plasticity -can explain the stable emergence of reliable and efficient input encoding. We have shown that

556
The authors declare no competing interests.

558
The first part of this section (4.1-4.5) describes the network architecture including the neuron model and    As network input, we used whitened patches from natural scenes (Olshausen & Field, 1996, 1997. Each  while preserving the precise timing of spikes drawn from a Poisson process.

613
The spike trace x i is updated whenever the presynaptic neuron i spikes, and decays exponentially: 614 X i (t) = 1 if a spike is present at time t, and X i (t) = 0 otherwise.
For the neurons in the V1 excitatory and inhibitory layer, we used a variant of the adaptive exponential  Table 1.

623
The full equation for the membrane potential is As the triplet voltage STDP rule is sensitive to the precise time course of the membrane voltage, including

633
The reset value for the spike threshold is V Tmax , with exponential decay towards the resting value V Trest , 634 with a time constant τ V T (Eq. 4): The afterpotential z has a reset value of I sp and decays to zero (Eq.5). Further, the variable w ad is 636 incremented by the value b and decays exponentially (Eq. 6).
where t i denotes the spike time of presynaptic neuron i, and δ is the indicator function with δ(0) = 1. averages of the membrane potential, u + and u − , defined as follows: Here, x + = max(x, 0) denotes top-half rectification.

656
The full learning rule is given as the sum of the LTP term and the LTD term: where A LT P and A LT D are the learning rates for LTP and LTD, θ + and θ − are threshold parameters, A LT P (noInh) 7.2 × 10 −5 n/a n/a A LT D (noInh) 5.6 × 10 −5 n/a n/a u ref 60.0mV 2 55.0mV 2 55.0mV 2 Table 1: Parameters for the neuron model and excitatory synapses. Note that for the noInh model, learning rates were reduced to compensate for the increased firing rates in the absence of inhibition.

ItoE and ItoI
ItoE ItoI Here, η is the learning rate, and ρ is a constant which controls the amount of LTD relative to LTP. Further,  Table 1.  To do this, we subtract the the OFF-synapses can be activated by the input, so that the weights will also follow this distribution.
In addition to the visualization based on weight matrices, the receptive fields can also be revealed by 720 probing the neurons with random stimuli. This approach has been successfully used in physiological 721 research, in form of the spike triggered average (STA) ( Here, s(t n ) is the input stimulus at time point t n , when the nth spike has occurred, and N is the total 730 number of postsynaptic spikes. Accordingly, stimuli evoking more spikes are higher weighted than stimuli 731 evoking few or no spikes.

732
As we observed a high similarity between each neuron's STA and its ON-OFF receptive field, we concluded 733 that the overall receptive field shape was not significantly influenced by inhibition. Thus, for simplicity, 734 the feed-forward weight vectors can be used for further evaluations. As mentioned above, the feed-forward weight vector approximates the receptive field of a neuron. To 737 measure the similarity between two receptive fields, we calculate the cosine between their feed-forward 738 weight vectors (Eq. 13).  we repeated every presentation 50 times, and calculated the mean across all 50 repetitions (or 6.25s 756 presentation time). In contrast to the natural scene input used for training, the maximum input firing 757 rate was set to 85.7Hz. This was suitable to obtain sufficiently high activity levels.

758
To estimate tuning curve sharpness, we calculated the orientation bandwidth (OBW) for every neuron.

759
The OBW is defined as the half-width of the tuning curve, at an activity level of 1 To quantify the diversity of receptive field orientations, we calculated a histogram over the measured 765 preferred orientations to measure the distribution and the incidence of a specific orientation (P (o) where o 766 is the index to a specific orientation) Then, we calculated the Kullback-Leibler divergence (Eq.14) between 767 this distribution and an idealized uniform distribution of orientations (Q(o)). To determine the ratio between excitatory and inhibitory input current, we measure both incoming currents 787 for the excitatory population for 1.000 randomly chosen natural scenes. Every scene was presented for 788 125ms and was repeatedly shown for 100 times. We averaged the incoming currents over the input stimuli where N denotes the number of image pixels. To obtain the reconstructed image I r , we subdivided the full 815 image into patches of size 12 × 12, in an overlapping fashion (in increments of 3 pixels). We showed each 816 patch 50 times for 125ms each, and recorded neuronal activities. We weighted the activity of each neuron 817 by its feed-forward weights to obtain a linear reconstruction of each image patch, which we combined 818 to reconstruct the full image. This approach is equivalent to calculating the IRE for individual patches,   where α s1 and α s2 denote the projected responses. Next, we calculated the means and variances of 858 the projected responses α s1 and α s2 , denoted by (µ s1 , σ 2 s1 ) and (µ s2 , σ 2 s2 ). Finally, we calculate the 859 discriminability d s1,s2 , as the ratio between the separation of the means and the variances of the projected 860 data: Note that we used the same sequence of patches for all model configurations to calculate the discriminability, Therefore, we can measure differences in the response variance.