^{1}

^{1}

^{1}

^{2}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: JL JB. Performed the experiments: JB MH JL. Analyzed the data: JB. Wrote the paper: JL JB. Algorithms' implementation and parallelization: JB MH.

Simple cells in primary visual cortex were famously found to respond to low-level image components such as edges. Sparse coding and independent component analysis (ICA) emerged as the standard computational models for simple cell coding because they linked their receptive fields to the statistics of visual stimuli. However, a salient feature of image statistics, occlusions of image components, is not considered by these models. Here we ask if occlusions have an effect on the predicted shapes of simple cell receptive fields. We use a comparative approach to answer this question and investigate two models for simple cells: a standard linear model and an occlusive model. For both models we simultaneously estimate optimal receptive fields, sparsity and stimulus noise. The two models are identical except for their component superposition assumption. We find the image encoding and receptive fields predicted by the models to differ significantly. While both models predict many Gabor-like fields, the occlusive model predicts a much sparser encoding and high percentages of ‘globular’ receptive fields. This relatively new center-surround type of simple cell response is observed since reverse correlation is used in experimental studies. While high percentages of ‘globular’ fields can be obtained using specific choices of sparsity and overcompleteness in linear sparse coding, no or only low proportions are reported in the vast majority of studies on linear models (including all ICA models). Likewise, for the here investigated linear model and optimal sparsity, only low proportions of ‘globular’ fields are observed. In comparison, the occlusive model robustly infers high proportions and can match the experimentally observed high proportions of ‘globular’ fields well. Our computational study, therefore, suggests that ‘globular’ fields may be evidence for an optimal encoding of visual occlusions in primary visual cortex.

The statistics of our visual world is dominated by occlusions. Almost every image processed by our brain consists of mutually occluding objects, animals and plants. Our visual cortex is optimized through evolution and throughout our lifespan for such stimuli. Yet, the standard computational models of primary visual processing do not consider occlusions. In this study, we ask what effects visual occlusions may have on predicted response properties of simple cells which are the first cortical processing units for images. Our results suggest that recently observed differences between experiments and predictions of the standard simple cell models can be attributed to occlusions. The most significant consequence of occlusions is the prediction of many cells sensitive to center-surround stimuli. Experimentally, large quantities of such cells are observed since new techniques (reverse correlation) are used. Without occlusions, they are only obtained for specific settings and none of the seminal studies (sparse coding, ICA) predicted such fields. In contrast, the new type of response naturally emerges as soon as occlusions are considered. In comparison with recent

Evolution and synaptic plasticity optimize the visual cortex for the processing of visual stimuli. The quantification of the degree of optimization has long been subject of theoretical and physiological studies. Among the most influential contributions are models such as independent component analysis

But does neglecting or including occlusions have an impact on receptive fields predicted by sparse coding? If so, what is the main difference if occlusions are considered and how do model predictions compare with experimental measurements? A critical inspection of standard sparse coding as a model for simple cell responses has recently been motivated by increasingly detailed experimental studies of simple cell responses. Using reverse correlation, a broad variety of receptive field shapes has been recorded, e.g., for macaque monkeys

After the discrepancy of diverse receptive field shapes and standard encoding models was pointed out

In this study we, for the first time, provide a systematic investigation of the impact of occlusion-like non-linearities on predicted simple cell responses. In order to quantify the differences to the neglection of occlusions, we study two sparse coding models: one assuming standard linear superposition

Suppose all hidden units are zero except of units

Although the only difference between the two sparse coding models investigated is the rule for component combination, non-linear sparse coding versions have been investigated much less than linear versions because parameter optimization becomes more challenging. To model image patches for instance, large-scale applications of non-linear models with large numbers of observed and hidden variables have not yet been reported. By applying novel training methods

We compare two generative sparse coding models for the encoding of image patches by simple cells. Both models have the same set of parameters and both assume, like standard sparse coding, independent visual components and Gaussian noise in the data. The distinguishing feature of the non-linear model is the use of a point-wise maximum to describe the combination of visual components. The maximum combination is illustrated and contrasted with the standard linear combination in

For each model above we now seek the parameters that optimally model the statistics of image patches. As a result, each model predicts a set of basis functions which can be compared to each other and to

For the generative models above, we optimized the model parameters for a set of natural image patches. First, natural image patches were preprocessed using an array of linear center-surround filters to model preprocessing by the lateral geniculate nucleus (LGN). Details are given in the

Sparse coding with binary latents as in BSC results in a consistently higher percentage of globular fields ranging from

Of all remaining non-globular fields predicted by the models, almost all have a Gabor-like shape (with few fields having unspecific shapes; see

Other than investigating different models for image patch encoding, we explored different preprocessing methods prior to the application of the encoding models. We used a neurally plausible preprocessing by modeling LGN input to the cortex using center-surround (difference-of-Gaussians) filtered patches. Another (and related) method of preprocessing popular for functional modeling is zero-phase PCA whitening

Unlike standard sparse coding

In analogy to

For each example the figure shows: the original patch (left), its DoG preprocessed version (second to left), and the decomposition of the preprocessed patch by the three models. For better comparison with the original patches, basis functions are shown in grey-scale. The displayed functions correspond to the active units of the most likely hidden state given the patch. In the case of standard sparse coding, the basis functions are displayed in the order of their contributions. Standard sparse coding (SC) uses many basis functions for reconstruction but many of them contribute very little. BSC uses a much smaller subset of the basis functions for reconstruction. MCA typically uses the smallest subset. The basis functions of MCA usually correspond directly to edges or to two dimensional structures of the image while basis functions of BSC and (to a greater degree) of SC are more loosely associated with the true components of the respective patch. The bottom most example illustrates that the globular fields are usually associated with structures such as end-stopping or corners. For the displayed examples, the normalized root-mean-square reconstruction errors (nrmse) allow to quantify the reconstruction quality. For standard sparse coding the errors are (from top to bottom) given by 0.09, 0.08, 0.10 and 0.12, respectively. For the two models with Bernoulli prior they are larger with 0.51, 0.63, 0.53, and 0.42 for MCA, and 0.37, 0.47, 0.44 and 0.39 for BSC. We give reconstruction errors for completeness but note that they are for all models based on their most likely hidden states (MAP estimates). For MCA and BSC the MAP was chosen for illustrative purposes while for most tasks these models can make use of their more elaborate posterior approximations.

In this work we have investigated the impact of occlusion non-linearities in visual stimuli on simple cell coding. Specifically, we compared optimal coding of a linear sparse coding model to a sparse coding model taking strong occlusion-like non-linearities into account. The comparison of the two (otherwise identical) sparse coding models showed significant differences in the predicted receptive fields as well as in predicted levels of sparsity.

The non-linear model consistently predicted a high percentage of globular receptive fields (

For comparison, the experimentally measured percentages of globular fields (

Furthermore, the reported results suggest direct experiments to verify or falsify the models studied here: Suppose different simple cells with receptive fields at the same location in the visual field were identified, then the linear and non-linear models could be used to predict the responses if complex stimuli are presented at the same location. For a crossing of two edges the linear model would for instance predict responses less aligned with responses to the individual edges than the non-linear model (compare

In contrast to differences in sparsity and in the percentage of globular receptive fields, we found the differences of Gabor-shape distributions (

Since the diversity of receptive field shapes was suggested as a means for comparison of models to experimental data

In addition to functional and probabilistic approaches to model simple cell coding, other computational investigations are based on models of neural circuits. While many studies directly relate to linear sparse coding

In general there may, therefore, be relevant aspects other than the theoretical optimality of the generative model itself. To obtain as optimal as possible results, an encoding model has to fulfill two requirements: (A) it has to reflect the data generation process well and (B) it has to provide an efficient procedure to learn optimal parameters. A simpler model may in practice have the advantage of a more efficient learning procedure while learning based on a non-linear model may be harder. There may, for instance, be higher computational costs associated with a non-linear model or convergence to local optima may represent a problem. It has, therefore, been argued in the literature

Note, that the maximum non-linearity and standard linear superposition as studied here are only two possible models for the combination of components. In the literature, other non-linearities such as noisy-OR combinations

Although sparse coding and its variants represent the standard model for simple cell coding, other computational models have been suggested. More recently, for instance, the suitability of mixture model approaches has been discussed

While different recent models report that globular receptive fields do emerge in applications to image patches

Both Gabor-like and globular fields are useful for image encoding. While Gabors are closely associated with edges, we observed globular fields to be more closely associated with two dimensional structures (see

Our study answers whether occlusions can have an impact on theoretical predictions of simple cell models. Based on a direct comparison of superposition assumptions we have observed very significant differences between the receptive fields and sparsity levels predicted by the linear and the occlusive model. Both models represent approximations of the exact model for local visual component combinations. However, we have observed that a non-linear superposition results in both a closer match to the true combination rule of visual components and a closer match of predicted receptive fields to

In this study we compared the predictions of two sparse coding models, MCA and BSC, when trained on natural image patches. Given the generative models (

The M-step equation for the generative fields

To derive update equations for

The derivation of the M-step update for

To summarize, the M-step equations for the MCA model are given by:

One important property of the max-function of the MCA model is that only the largest value of its arguments determines the function's value. In the case of a finite dataset for optimization, this has the effect that those elements of the matrix

The computational complexity of the MCA learning algorithm is dominated by the number of states that have to be evaluated for each E-step. The scaling of this number can be estimated to be (compare

For the BSC model, the derivation of the M-step for

For all numerical experiments with MCA and BSC the model parameters needed to be initialized. We used the same initialization procedure for both models and set the basis functions

For standard sparse coding we applied a MAP based approximation to optimize the parameters

To verify that the learning algorithms for MCA and BSC correctly recover data components at least approximately, we first applied them to artificial stimuli where ground-truth is available. For each model, a dataset of

To optimize the model parameters on natural image stimuli, we extracted a set of

For each experiment, the same set of stimuli was used to train the three models under consideration. Those experiments, where we screened through different degrees of overcompleteness (

To control for changes of receptive field shapes depending on different types of preprocessing, we applied MCA and BSC to zero-phase PCA (ZCA) whitened patches

ZCA: Zero-phase PCA (ZCA) preprocessing is common in more technical applications of sparse coding or ICA. We replaced the DoG convolution by ZCA and normalized the patches as for DoG preprocessing. When MCA and BSC are applied to ZCA whitened data, the globular field percentages change with a lower percentage of globular fields for MCA as one consequence. Also for ZCA whitened data, globular field percentages for MCA remain consistently and significantly higher than for BSC (with at least

Independent ON-/OFF-channels: In mammals, visual information is transferred to the cortex via two types of neurons in the lateral geniculus nucleus (LGN): center-ON and center-OFF cells. ON- and OFF-cells project to the primary visual cortex (mainly layer 4). Pairs of center-ON and center-OFF cells can be combined to provide a net center-surround input to cortical cells. Such ‘push-pull’ inputs are suggested by strongly overlapping receptive fields of LGN cells connecting to the same cortical column (see, e.g., a recent study

In general, the type of preprocessing has an impact on the shapes of predicted receptive fields - affecting both percentages of globular fields and Gabor shape statistics. However, the difference in the percentages of globular fields with a consistently much higher percentage for the non-linear model is a very stable observation for all used preprocessing models. Also the sparsity of the non-linear model has always been observed to be much higher. Differences between the non-linear and linear model were much less pronounced if the shape distributions of Gabor-like fields were considered. While we found differences between the models for different preprocessing types, they were small compared to differences in sparsity and globular field percentages. At the same time, all distributions using

After parameter optimization we computed an estimate of the predicted receptive fields by convolving the learned basis functions

The convolution with the DoG filter is an estimate of the receptive field assuming a linear mapping: If

To analyse the shape statistics of the estimated receptive fields resulting from our numerical experiments and from experimental recordings

Similarly, again for each receptive field

To analyse the shape distribution of receptive fields, the shape relevant parameters can be visualized as an

(TIFF)

(TIFF)

(TIFF)

(TIFF)

(TIFF)

(TIFF)

(TIFF)

We thank Dario Ringach for providing original receptive field recordings of macaque monkeys.