^{1}

^{2}

^{1}

^{2}

^{3}

^{1}

^{2}

^{*}

Conceived and designed the experiments: JDF RJR LCS TOS. Performed the experiments: LCS. Analyzed the data: JDF RJR LCS TOS. Wrote the paper: JDF RJR LCS TOS.

The authors have declared that no competing interests exist.

Conventional methods used to characterize multidimensional neural feature selectivity, such as spike-triggered covariance (STC) or maximally informative dimensions (MID), are limited to Gaussian stimuli or are only able to identify a small number of features due to the curse of dimensionality. To overcome these issues, we propose two new dimensionality reduction methods that use minimum and maximum information models. These methods are information theoretic extensions of STC that can be used with non-Gaussian stimulus distributions to find relevant linear subspaces of arbitrary dimensionality. We compare these new methods to the conventional methods in two ways: with biologically-inspired simulated neurons responding to natural images and with recordings from macaque retinal and thalamic cells responding to naturalistic time-varying stimuli. With non-Gaussian stimuli, the minimum and maximum information methods significantly outperform STC in all cases, whereas MID performs best in the regime of low dimensional feature spaces.

Neurons are capable of simultaneously encoding information about multiple features of sensory stimuli in their spikes. The dimensionality reduction methods that currently exist to extract those relevant features are either biased for non-Gaussian stimuli or fall victim to the curse of dimensionality. In this paper we introduce two information theoretic extensions of the spike-triggered covariance method. These new methods use the concepts of minimum and maximum mutual information to identify the stimulus features encoded in the spikes of a neuron. Using simulated and experimental neural data, these methods are shown to perform well both in situations where conventional approaches are appropriate and where they fail. These new techniques should improve the characterization of neural feature selectivity in areas of the brain where the application of currently available approaches is restricted.

In recent years it has become apparent that many types of sensory neurons simultaneously encode information about more than one stimulus feature in their spiking activity. Examples can be found across a wide variety of modalities, including the visual

Neural coding of multiple stimulus features is typically modeled as a linear-nonlinear Poisson (LNP) process

Given a set of stimuli

The use of Gaussian stimuli makes it possible to find many relevant dimensions using STC, but fully sampling the dynamic range of responses often requires a

The MID method is an information theoretic dimensionality reduction technique that identifies relevant features based on how much information a linear subspace contains about the observed spikes (see

Here we put forth two new dimensionality reduction techniques applicable to arbitrary stimulus distributions. These methods, much like STC, make use of pairwise correlations between stimulus dimensions and are not hindered by the curse of dimensionality in the same manner as MID. To demonstrate the usefulness of the proposed methods, we apply them to simulated neural data for two biologically inspired model cells, and to physiological recordings of the response of macaque retina and thalamus cells to time-varying stimuli.

If the spiking activity of a neuron is encoding certain aspects of the stimulus, then the corresponding stimulus features must be correlated in some way with the neural response. From an experiment one can estimate specific stimulus/response correlations, such as the spike-triggered average (STA), the spike-triggered covariance (STC), or the mutual information

The minimal model of

The contours of constant probability of the minimal second order models are quadric surfaces, defined by the quadratic polynomial

The

The linear term in Eq. (3) may also contain a significant feature. Subtracting off the relevant dimensions found from diagonalizing

The minimal models of binary response systems take the form of logistic functions. This restriction can be eliminated if we look for a maximally informative second order model. To accomplish this, we extend the MID algorithm to second order in the stimulus by assuming the firing rate is a function of a quadratic polynomial,

Once the maximally informative parameters are found, the matrix

To test and compare the two proposed methods, both to each other and to the established methods such as STC and MID, we created two model cells designed to mimic properties of neurons in primary visual cortex (V1). The first model cell was designed to have two relevant dimensions, which places it in the regime where the linear MID method should work. The second model was designed to have six relevant dimensions and serves as an example of a case that would be difficult to characterize with linear MID. Using the van Hateren

To quantify the performance of a given dimensionality reduction method, we calculate the subspace projection

The first model cell was constructed to respond to the two Gabor features shown in

As expected, the STC method performed poorly due to the strong non-Gaussian properties of natural stimuli

A second model cell was also created to resemble a V1 complex cell, but with a divisive normalization based on inhibitory features with orthogonal orientation in the center and parallel orientation in the surround

The performance of the various dimensionality reduction methods is shown in

To demonstrate the usefulness of the new approaches proposed here for the analysis of real neural data, we analyzed the responses of 9 macaque retina ganglion cells (RGC) and 9 cells from the lateral geniculate nucleus (LGN) under naturalistic stimulus conditions

While we cannot know the true features of these neurons as we can for the model cells, this data was previously analyzed using MID

We show the result of fitting the minimal model to one of the RGCs. The parameters are shown in

A second order minimal model was fit to the spike train of a RGC.

Although the two most informative dimensions captured a very large percentage of the information in the neural response

Both of the methods proposed here find relevant subspaces using second order stimulus statistics and can therefore be seen as extensions of the STC method. The minimal model is forced to have a logistic function nonlinearity, which has the benefit of removing unwanted model bias regarding higher than second order stimulus moments. In contrast, nonlinear MID uses an arbitrary nonlinear gain function and is therefore able to make use of higher order statistics to maximize information. Although both methods yield models consistent with first and second order stimulus/response correlations, neither method is guaranteed to work if the underlying neural computation does not match the structure of the model or the assumptions that underlie the estimation of relevant features.

In principle, the flexibility in the nonlinear MID gain function means it should perform at least as well as the minimal model. However, what we have observed is that the nonlinear MID subspace projection with these two model cells is slightly smaller than the minimal model subspace. This may be due to the differences in the nature of the optimization problems being solved in the two methods. Maximizing noise entropy under constraints is a convex optimization problem

Neurons with selectivity for only a few features that are probed with non-Gaussian stimuli, such as the model cell shown in

Experimental data were collected as part of a previous study using procedures approved by the UCSF Institutional Animal Care and Use Committee, and in accordance with National Institutes of Health guidelines.

When applied to stimuli with correlations, a whitening procedure can be used to correct for them

Whitening has the consequence of amplifying noise along poorly sampled dimensions. To combat this effect, we regularize using a technique called ridge regression

Maximally informative dimensions

To extend the MID algorithm to nonlinear MID (nMID), the stimulus is simply transformed by a nonlinear operation. For the second order nonlinear transformation considered in this paper,

To prevent overfitting of the parameters, an early stopping mechanism was used whereby the data was broken into two sets: one set was used for training and the other used for testing. The training set was used to search the parameter space, while the test set was used to evaluate the parameters on independent data. The best linear combination for both data sets was returned by the algorithm. This procedure was done four times, using four different quarters of the complete data set as the test set. The resulting parameters found from these four fittings were averaged before diagonalizing and finding the relevant features. Unlike the regularization of STC models, this procedure can be used when analyzing experimental data.

The model of the neural response that matches experimental observations in terms of the mean response probability, as well as correlations between the neural response with linear and quadratic moments of stimuli can be obtained by enforcing

To prevent overfitting of the parameters, an early stopping procedure was implemented similar to that used in the MID algorithm. Each step of the algorithm increased the likelihood of the training set, but at some point began decreasing the likelihood of the test set, indicating the fitting of noise within the training set. The algorithm then returned the parameters found at the maximum likelihood of the test set. As described above, this was done four times with different quarters of the data serving as the test set and the resulting parameter vectors were averaged before diagonalizing the matrix

Significance testing of the eigenvalues was done by creating 500 Gaussian distributed random matrices with the same variance as that of the set of elements of ^{th} percentile or above the 97.5^{th} percentile.

The data analyzed in this paper were collected in a previous study

We thank Jonathan C. Horton for sharing the data collected in his laboratory and the Sharpee group for helpful conversations.