^{*}

The authors have declared that no competing interests exist.

Analyzed the data: JMM YC DAB. Wrote the paper: JMM DAB. Developed the model: JMM YC DAB.

The computation represented by a sensory neuron's response to stimuli is constructed from an array of physiological processes both belonging to that neuron and inherited from its inputs. Although many of these physiological processes are known to be nonlinear, linear approximations are commonly used to describe the stimulus selectivity of sensory neurons (i.e., linear receptive fields). Here we present an approach for modeling sensory processing, termed the Nonlinear Input Model (NIM), which is based on the hypothesis that the dominant nonlinearities imposed by physiological mechanisms arise from rectification of a neuron's inputs. Incorporating such ‘upstream nonlinearities’ within the standard linear-nonlinear (LN) cascade modeling structure implicitly allows for the identification of multiple stimulus features driving a neuron's response, which become directly interpretable as either excitatory or inhibitory. Because its form is analogous to an integrate-and-fire neuron receiving excitatory and inhibitory inputs, model fitting can be guided by prior knowledge about the inputs to a given neuron, and elements of the resulting model can often result in specific physiological predictions. Furthermore, by providing an explicit probabilistic model with a relatively simple nonlinear structure, its parameters can be efficiently optimized and appropriately regularized. Parameter estimation is robust and efficient even with large numbers of model components and in the context of high-dimensional stimuli with complex statistical structure (e.g. natural stimuli). We describe detailed methods for estimating the model parameters, and illustrate the advantages of the NIM using a range of example sensory neurons in the visual and auditory systems. We thus present a modeling framework that can capture a broad range of nonlinear response functions while providing physiologically interpretable descriptions of neural computation.

Sensory neurons are capable of representing a wide array of computations on sensory stimuli. Such complex computations are thought to arise in large part from the accumulation of relatively simple nonlinear operations across the sensory processing hierarchies. However, models of sensory processing typically rely on mathematical approximations of the overall relationship between stimulus and response, such as linear or quadratic expansions, which can overlook critical elements of sensory computation and miss opportunities to reveal how the underlying inputs contribute to a neuron's response. Here we present a physiologically inspired nonlinear modeling framework, the ‘Nonlinear Input Model’ (NIM), which instead assumes that neuronal computation can be approximated as a sum of excitatory and suppressive ‘neuronal inputs’. We show that this structure is successful at explaining neuronal responses in a variety of sensory areas. Furthermore, model fitting can be guided by prior knowledge about the inputs to a given neuron, and its results can often suggest specific physiological predictions. We illustrate the advantages of the proposed model and demonstrate specific parameter estimation procedures using a range of example sensory neurons in both the visual and auditory systems.

Sensory perception in the visual and auditory systems involves the detection of elemental features such as luminance and sound intensity, and their subsequent processing into more abstract representations such as “objects” that comprise our perception. The neuronal computations performed during such sensory processing must be nonlinear in order to generate more complex stimulus selectivity, such as needed to encode the conjunction of multiple sensory features

Nevertheless, characterizations of sensory neurons still typically rely on the assumption of linear stimulus processing, which is often implicit in standard approaches such as spike-triggered averaging and – more recently – generalized linear models (GLMs)

Unfortunately, the space of possible nonlinear models is not bounded. While one might be inclined to incorporate details of the system and circuitry in question, more complicated models require more data for parameter estimation, and often involve poorly behaved or intractable optimization problems. As a result, practical nonlinear modeling approaches must make assumptions that limit the space of functions considered by restricting to a defined set of nonlinear interactions.

Several different approaches have been developed in this regard. The most common is to identify a low dimensional “feature space” to which the neuron is sensitive, with the assumption that its firing rate depends on a nonlinear function applied only to these stimulus features. Prominent examples of this approach include spike-triggered covariance (STC) analysis

A second general approach is to assume the form of nonlinearities present, most commonly based on a second-order approximation of the nonlinear stimulus-response relationship, as with the Wiener-Volterra expansion

A final commonly used approach assumes that relevant nonlinearities can be captured by directly augmenting the linear model to account for specific response properties, such as the addition of refractoriness to account for neural precision

Here, we present a probabilistic modeling framework inspired by all of these approaches, the ‘Nonlinear Input Model’ (NIM), which limits the space of nonlinear functions by assuming that nonlinearities in sensory processing are dominated by spike generation, resulting in both rectification of the inputs to the neuron, as well as rectification of the neuron's output. By assuming a neuron's inputs are rectified, the NIM implicitly describes neuronal processing as a sum over excitatory and inhibitory inputs, which is increasingly being seen as an important factor in sensory processing

As we show here, this results in a parsimonious nonlinear description of a range of neurons in both the visual and auditory systems, and has several advantages over previous approaches. Because of its relatively simple model structure, parameter estimation is well-behaved and makes efficient use of the data, even when the number of relevant inputs is large and/or the stimulus is high-dimensional. Importantly, because its form is based on an integrate-and-fire neuron, model selection and parameter estimation can be guided by specific knowledge about the inputs to a given neuron, and the elements of the resulting model can often be related to specific physiological predictions. The NIM thus provides a powerful and general approach for nonlinear modeling that complements other methods that rely on more abstract formulations of nonlinear computation.

Perhaps the greatest success of linear models is in the retina, where it has been used primarily to describe the spike responses of retinal ganglion cells (RGCs)

To explore this situation, we construct a basic model of an ON-OFF RGC, which receives separate ON and OFF inputs (_{ON})+(_{OFF}) = _{ON}+_{OFF}) = _{SUM}. Here the stimulus _{ON})+_{OFF}). These

Thus, this is a clear example where nonlinear characterization is necessary to capture the RGC's stimulus selectivity. One such approach that has been applied to ON-OFF cells is spike-triggered covariance (STC) analysis

Given the dimensionality reduction achieved in determining the STC subspace (or with other subspace identification methods), it is possible in principle to completely characterize the neural response function, i.e., _{1}·_{2}·

However, even if accurate estimation of this nonlinear mapping were possible, such functions are difficult to interpret, even when arising from the conjunction of simpler components. For example, in our simulated ON-OFF RGC, neither the STA/STC filters themselves nor the measured nonlinear mapping make it clear that the response is generated from separate inputs with relatively straightforward nonlinearities.

This example thus motivates the modeling framework that we present here, the Nonlinear Input Model (NIM), which describes a neuron's stimulus processing as a sum of nonlinear inputs, following the structure of the generative model shown in

The computational challenges associated with parameter estimation are a significant barrier to the successful development and application of nonlinear models of sensory processing. In the standard linear-nonlinear (LN) model, the neuron's response is modeled by an initial stage of linear stimulus filtering, followed by a static nonlinear function (“spiking nonlinearity”) that maps the output to a firing rate (

A) Schematic diagram of an LN model, with multiple filters (k_{1}, k_{2}, …) that define the linear stimulus subspace. The outputs of these linear filters (_{1}, _{2}, …) are then transformed into a firing rate prediction _{1},_{2},…], depicted at right for a two-dimensional subspace. Note that while the general LN model thus allows for a nonlinear dependence on multiple stimulus dimensions, estimation of the function _{i}, and “upstream nonlinearity” _{i}_{i}, and fed into the spiking nonlinearity

A principal motivation for the NIM structure is that if the neuronal output at one level is well described by an LN model, downstream neurons will receive inputs that are already rectified (or otherwise nonlinearly transformed). Thus, we use LN models to represent the inputs to the neuron in question, and the neuron's response is given by a summation over these LN inputs followed by the neuron's own spiking nonlinearity (

The processing of the NIM is comprised of three stages (_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}(.) are rectifying functions, the _{i}

Parameter estimation for the NIM is based on maximum likelihood (or maximum a posteriori) methods similar to those used with the GLM _{obs}(

To find the set of parameters that maximize the likelihood (

Because it is straightforward to estimate the linear term _{i}_{i}_{i} are the key components that must be fit in the NIM. While it is typically not feasible to optimize the likelihood with respect to both sets of parameters simultaneously, an efficient strategy is to use block coordinate ascent _{i}_{i}

While the set of ‘upstream nonlinearities’ _{i}_{j}_{i}_{j} a_{ij}φ_{j}_{ij}_{i}.

For a fixed set of upstream nonlinearities, the stimulus filters _{i} can be similarly optimized, although the resulting likelihood surface will not in general be convex because the _{i} operate inside the upstream nonlinearities. Nevertheless, we have found that in practice their optimization is well-behaved and that local minima can be avoided with appropriate optimization procedures (_{i} analytically (see

Thus, optimal parameter estimates for the NIM can be determined efficiently, even for models with large numbers of parameters (see examples below). The time required for filter estimation (typically the most time-consuming step) scales approximately linearly with the experiment duration, the dimensionality of the stimulus, and the number of model subunits (

Furthermore, because the NIM provides an explicit probabilistic model for the neuronal spike response, regularization of the model components can be incorporated without adversely affecting the behavior of the optimization problem

The NIM thus provides a nonlinear modeling framework in which large numbers of parameters can be efficiently estimated using data recorded with arbitrarily complex stimulus ensembles. In addition to this flexibility, the NIM provides model fits that are more directly interpretable due to its physiologically motivated model structure. To illustrate these advantages, below we first apply the NIM to the example ON-OFF RGC from

Returning to the example ON-OFF RGC (

This example thus illustrates the core motivation behind the NIM of modeling a neuron's stimulus processing in terms of rectified neuronal inputs. While the structure of the simulated RGC neuron in this example may appear to be a convenient choice, its form is consistent with other models of ON-OFF processing

Thus, to understand the advantages and disadvantages of the NIM structure, it is useful to compare it with the dominant alternative approach for describing nonlinear stimulus processing: “quadratic models”. Such models have recently been cast in an information-theoretic context _{i}_{L}_{i}

For the ON-OFF RGC, the GQM finds one linear and two quadratic filters, all of which are contained in the two-dimensional subspace identified by STC analysis, meaning that the GQM filters are also linear combinations of the true ON and OFF filters (

Although in this example the resulting quadratic function cannot completely capture the form of the response function constructed from rectified inputs, we note that it still provides a good approximation, as shown by only modest reductions in model performance compared to the NIM (

We emphasize that one of the key advantages of the NIM over previously described methods is that it provides a more interpretable picture of stimulus processing as a sum of rectified neuronal inputs. As we demonstrate through several examples below in both the visual and auditory systems, it appears that sensory computation by neurons will often adhere to this general form, which is motivated primarily by physiological, rather than mathematical, considerations.

One of the main advantages of the NIM structure is the ability to specifically model the effects of inhibitory inputs, which are increasingly being shown to have a large impact on neuronal processing in many sensory areas

We first consider an example cat LGN neuron during the presentation of natural movies

A) The linear receptive field can be represented as the sum of two space-time separable components, corresponding to the receptive field center (red) and surround (blue). B) The NIM with excitatory (top) and suppressive (i.e., putative inhibitory, bottom) inputs. The excitatory and suppressive components (solid) both have slower, and less biphasic, temporal responses (left) compared with the linear model (dashed). The suppressive input is also delayed relative to the excitatory input. Both excitatory and suppressive inputs have roughly the same spatial profiles (middle), and both provide rectified input through the corresponding upstream nonlinearities (right). C) The NIM has significantly better performance, as measured by cross-validated log-likelihood, compared to the linear model (

Next we consider an example neuron from zebra finch area MLd, as the animal is presented with conspecific bird songs

A) The linear spectrotemporal receptive field (STRF; left) contains two subfields of opposite sign. B) The excitatory (top) and suppressive (bottom) spectrotemporal filters identified by the NIM are similar to the positive and negative subfields of the linear STRF respectively. However, these inputs are both rectified by the upstream nonlinearities (right), resulting in different stimulus processing (see

Thus far we have only considered cases where the neuron's response is described by a NIM with a small number of inputs, consistent with simpler stimulus processing in sub-cortical areas. In contrast, in the visual cortex, even V1 ‘simple cells’ can exhibit selectivity to large numbers of stimulus dimensions

We first consider two simulated V1 neurons in order to demonstrate the capacity for such a unified description, before applying the NIM to experimental data. We generate simulated data using a one-dimensional white-noise bar stimulus aligned with the simulated neurons' preferred spatial orientation (

A) Simulated V1 neurons are presented with one-dimensional spatiotemporal white noise stimuli (left). Their stimulus processing is constructed from a set of spatiotemporal filters (example shown at right), depicted with one spatial dimension (x-axis) and time lag (y-axis). B) The first simulated neuron is constructed from six spatially overlapping direction-selective filters (top), similar to those observed experimentally for V1 neurons. Below, the corresponding filtered stimulus distributions are shown along with the respective upstream nonlinearities (blue). C) The NIM identifies the correct spatiotemporal filters (top), as well as the form of the upstream nonlinearities (middle). The projections of the NIM filters onto the true filters (bottom) illustrate that the NIM identifies the true filters. D) The STA for the simulated neuron (left), along with the three significant STC filters (right) are largely contained in the subspace spanned by the true filters, but reflect non-trivial linear combinations of these filters (bottom). E) The GQM is composed of a linear input (left) and three excitatory squared inputs (right). While the GQM filters are more similar to the true filters, they also represent non-trivial linear combinations of them (bottom). F) The second simulated neuron consists of four similar, but spatially shifted, inputs that are squared. G) The NIM represents each true (squared) input by an opposing pair of rectified inputs. H) The STA (left) does not show any structure because the neuron's response is, by construction, symmetric in the stimulus. The four significant STC filters (right) represent distributed linear combinations of the four underlying filters. I) The GQM recovers the correct stimulus filters, given appropriate sparseness regularization.

For the neuron with rectified inputs, the NIM fitting procedure is indeed able to identify the true underlying stimulus filters and the form of the rectifying upstream nonlinearities (

Furthermore, as with the ON-OFF RGC example above (

By comparison, the GQM identifies filters with characteristics that more closely resemble those of the true input filters (e.g., more localized, fewer lobes). The improved performance of the GQM compared with an STC-based model (

Of course, one would expect the NIM to outperform other models when the generative model is composed as a sum of rectified inputs. In a second simulated example, however, we illustrate the flexibility of the NIM in capturing other neural response functions. The second simulated neuron is constructed from four direction-selective inputs that are squared and summed together to generate a quadratic response function (

These two simulated V1 examples thus illustrate the potential tradeoffs between the NIM and GQM. On the one hand, the NIM provides a more flexible framework that can capture a broader range of nonlinear stimulus processing. In fact, any response function can in principle be represented with this structure

While the simulated examples above allowed for model comparisons when the neurons' response functions were known, they also provide a foundation for understanding model fits to real V1 data. We first consider a V1 neuron recorded from an anesthetized macaque in the context of similar one-dimensional white noise stimuli

A) Standard spike-triggered characterization for this neuron reveals a ‘complicated simple-cell’ response

We also fit a NIM with six excitatory and six suppressive stimulus filters, where the number of filters was selected based on cross-validated model performance (

Similar comparisons also come to light in when applying the models to V1 complex cells, even in the most demanding stimulus contexts. To illustrate this, we consider an example V1 neuron recorded from an anesthetized cat presented with natural and naturalistic stimuli (

A) The natural movie stimulus used here has two spatial and one temporal dimension. B) The neuron's response is characterized in terms of three-dimensional spatiotemporal filters. An example spatiotemporal filter is comprised of a spatial filter at each time step (at 20 ms resolution). To simplify the depiction of each filter, we take advantage of their stereotyped structure, and plot the spatial distribution at the best time slice (BTS, left), as well as the space-time projection (STP, right) along an axis orthogonal to the preferred orientation (red line; see

The GQM estimated for this neuron is comprised of a pair of excitatory, direction-selective squared filters, as well as a weaker, non-direction-selective linear filter (

The NIM identifies four rectified excitatory inputs that share similar spatial tuning and direction selectivity, but with different spatial phases (

Thus, the application of the NIM to V1 neurons further illustrates the generality of the method, and specifically emphasizes its ability to capture substantially more complex stimulus processing, with large numbers of inputs. We note that because cortical neurons are several synapses removed from receptor neurons, a cascade model with a longer chain of upstream LN components might be more appropriate, although existing methods could not be used for parameter estimation with such a model. The ability of the NIM to capture a given neuron's stimulus processing thus relates to the extent to which the upstream neurons themselves can be approximated by LN models. In cases where this assumption is not appropriate, one can apply a fixed nonlinear transformation to the stimulus resembling the response properties of upstream neurons

We have presented a physiologically inspired modeling framework, the NIM, which extends several recently developed probabilistic modeling approaches. Specifically, the NIM assumes a form analogous to an integrate-and-fire neuron, whereby a neuron receives a set of rectified excitatory and inhibitory inputs, each of which is assumed to process the stimulus linearly. The parameters can be estimated robustly and efficiently, and the resulting model structure is able to capture a broader range of neural responses than previously proposed probabilistic methods. Importantly, the physiologically inspired model structure of the NIM also allows for greater interpretability of the model fits, as the components of the model take the form of stimulus-driven excitatory and inhibitory inputs. The NIM thus provides a framework for connecting nonlinear models of sensory processing directly with the underlying physiology that can be applied in a range of sensory areas and experimental conditions.

As described above, the key parameters in the NIM are the stimulus filters _{i} and the set of coefficients _{ij} representing the upstream nonlinearities _{i}(.). While these parameters cannot generally be optimized simultaneously, a powerful approach is to use block coordinate ascent _{i}, and upstream nonlinearities _{i}(.), holding the remaining parameters fixed in each iteration. The parameters of the spiking nonlinearity function _{i} and _{i}(.) (which we find is typically sufficient). Note that the parameter _{ij}

Thus, at each stage of the fitting procedure we have the problem of maximizing a (penalized) log-likelihood function with respect to some subset of parameters, while holding a remaining set of parameters fixed. In all cases, we use a standard line search strategy to locate an optimum of the likelihood function given some initial values for the parameters. Because we are often optimizing very high-dimensional parameter vectors (specifically when optimizing the _{i}), we use a quasi-Newton method with a limited-memory BFGS approximation of the inverse Hessian matrix _{ij} of the upstream nonlinearities we additionally enforce a set of linear constraints (described below), and in such cases we utilize

Optimization of the filters can be accomplished efficiently by analytic calculation of the log-likelihood gradient with respect to the _{i}, which is given by:_{i}_{i}_{m}(t) is the ^{th} element of the stimulus at time _{i}, the optimization problem is well-behaved in practice. We note that while the derivatives of the _{i}_{i}

To diagnose the presence of undesirable local maxima, and to identify the global optimum of the likelihood function, we use repeated random initializations of our optimization routine (_{i} does not affect the identified local optimum. In other cases, the likelihood surface will contain more than one distinct local maximum, although usually only a small number. For example, when optimizing the filters for the example MLd neuron (

This procedure can be greatly sped up by initially optimizing the filters in a low-dimensional stimulus subspace, rather than in the full stimulus space. Such subspace optimization has been previous used in conjunction with STC analysis to identify the relevant stimulus subspace

We begin the NIM fitting with its upstream nonlinearities _{i}(.) initialized to be threshold-linear functions:

After estimating the _{i}, we then estimate the _{i}(.) nonparametrically, as a linear combination of a set of piecewise linear basis functions _{i}_{j} a_{ij}φ_{j}_{i} fixed. These basis functions are given by:_{k}. These points can be selected by referencing the distribution of the argument of _{i}(.), i.e., _{i}_{i}(t) = _{i}•_{i}_{i}_{i}(.) to be monotonically increasing functions by using a system of linear constraints on the _{ij} during optimization. Because the model is invariant to shifts in the ‘y-offset’ of the _{i}(.) (which can be absorbed into the spiking nonlinearity function), we add the additional set of constraints that _{i}(0) = 0 to eliminate this degeneracy. Furthermore, changes in the upstream nonlinearities can influence the effective regularization of the _{i}, by altering how each _{i} contributes to the model prediction. As a result, the coefficients _{ij} are rescaled after each iteration so that the standard deviation of each subunit's output is conserved. This ensures that the upstream nonlinearities do not absorb the scale of the _{i}.

An important advantage of explicit probabilistic models such as the NIM is the ability to incorporate prior knowledge about the parameters via regularization. Because each of the filters _{i} often contains a large number of parameters, regularization of the filters is of particular importance, as discussed elsewhere in the context of the GLM

We consider several different forms of regularization in the examples shown, to encourage the detection of smooth filters with sparse coefficients. Specifically, we add a general penalty term of the form:^{s}^{t}_{i}^{Ls}_{i}^{Lt}_{i}^{s}

Because we also expect the upstream nonlinearities _{i}(.) to be smooth functions, we incorporate penalty terms when estimating the parameters of the _{i}(.). Because we represent the _{i}(.) as linear combinations of localized tent basis functions: _{i}(.) = _{ij}_{j}(.), we can encourage smooth _{i}(.) by applying a penalty of the form: _{i}^{L}∥_{ij}∥_{2} to the set of coefficients _{ij} corresponding to a given _{i}(.), where

In general, the hyperparameters can be inferred from the data using Bayesian techniques _{i} and upstream nonlinearities _{i}(.) with the expected degree of smoothness/sparseness. To demonstrate that our results were not overly sensitive to the selection of hyperparameters, we compare the NIM and GQM fit to the example V1 neuron from

To evaluate model performance, we use ^{2}

While selection of the optimal number of excitatory and suppressive subunits can be performed using standard model selection techniques, such as nested cross-validation, this choice can also often be guided by the specific application. Importantly, we find that the subunits identified by the NIM, as well as its performance, are generally robust towards precise specification of the number of excitatory and suppressive subunits, with ‘nearby’ models typically providing a very similar characterization of the neurons' stimulus processing (

In order to simulate the response of an ON-OFF RGC, we generated a Gaussian white noise process sampled at 15 Hz (such as a luminance-modulated spot stimulus), which was then filtered using separate ON- and OFF-like filters (_{1}_{2}

The data were simulated at a temporal resolution of 8.3 ms, and model filters were represented at a lower resolution of 33 ms, with a length of 1 s. For the GQM and NIM we incorporated smoothness regularization on the filters, and for the NIM we also incorporated smoothness regularization on the upstream nonlinearity coefficients _{ij}

To identify the STA/STC subspace depicted in

Data for the LGN example were recorded extracellularly from an anaesthetized and paralyzed cat by the Alonso Lab

Each filter was represented by space-time separable center and surround components, and thus consisted of two sets of spatial and temporal filters _{ij}

Data for the songbird auditory midbrain example were provided by the Theunissen lab through the CRCNS database

The simulated V1 neurons shown in ^{2}, or log(1+exp(_{1}x_{2}

Both the GQM and NIM were fit using a sparseness penalty on the filters. For the NIM, we also used smoothness regularization on the _{ij}

The V1 neuron shown in

The V1 neuron shown in

For all analysis we used 8 time lags to construct spatiotemporal filters (each described by 8×20×20 = 3200 parameters). For STA/STC analysis we first whitened the stimulus by rotating into the principal component axes and normalizing each dimension to have unit standard deviation

To estimate filters of the LN model, GQM and NIM, we used sparseness regularization, as well as penalties on the (two-dimensional) spatial Laplacian at each time lag. To display the three-dimensional spatiotemporal filters we plot the time slice of each filter containing the most variance across pixels (‘best time slice’), as well as the projection of the filter onto a spatial axis orthogonal to the neuron's preferred orientation (‘space-time projection’)

(EPS)

^{5} time samples). We then varied the number of time lags used to represent the stimulus and measured the time required for parameter estimation. Estimation of the stimulus filters scales roughly linearly with the number of stimulus dimensions, while estimation of the upstream nonlinearities is largely independent of the number of stimulus dimensions. ^{5} time samples). Note that the additional step of estimating the upstream nonlinearities adds relatively little to the overall parameter estimation time, especially for more complex models.

(EPS)

(EPS)

(EPS)

(EPS)

(EPS)

The authors thank J.-M. Alonso for contributing the LGN data, the Theunissen lab and CRCNS database for providing the songbird auditory midbrain data, N. Rust and T. Movshon for contributing the macaque V1 data, and T. Blanche for contributing the cat V1 data. We also thank N. Lesica, and S. David for comments on the manuscript.