• Loading metrics

Nonlinear effects of intrinsic dynamics on temporal encoding in a model of avian auditory cortex

  • Christof Fehrman,

    Roles Formal analysis, Visualization, Writing – original draft

    Affiliation Psychology Department, University of Virginia, Charlottesville, Virginia, United States of America

  • Tyler D. Robbins,

    Roles Conceptualization, Writing – review & editing

    Affiliation Cognitive Science Program, University of Virginia, Charlottesville, Virginia, United States of America

  • C. Daniel Meliza

    Roles Conceptualization, Methodology, Visualization, Writing – review & editing

    Affiliations Psychology Department, University of Virginia, Charlottesville, Virginia, United States of America, Neuroscience Graduate Program, University of Virginia, Charlottesville, Virginia, United States of America

Nonlinear effects of intrinsic dynamics on temporal encoding in a model of avian auditory cortex

  • Christof Fehrman, 
  • Tyler D. Robbins, 
  • C. Daniel Meliza


Neurons exhibit diverse intrinsic dynamics, which govern how they integrate synaptic inputs to produce spikes. Intrinsic dynamics are often plastic during development and learning, but the effects of these changes on stimulus encoding properties are not well known. To examine this relationship, we simulated auditory responses to zebra finch song using a linear-dynamical cascade model, which combines a linear spectrotemporal receptive field with a dynamical, conductance-based neuron model, then used generalized linear models to estimate encoding properties from the resulting spike trains. We focused on the effects of a low-threshold potassium current (KLT) that is present in a subset of cells in the zebra finch caudal mesopallium and is affected by early auditory experience. We found that KLT affects both spike adaptation and the temporal filtering properties of the receptive field. The direction of the effects depended on the temporal modulation tuning of the linear (input) stage of the cascade model, indicating a strongly nonlinear relationship. These results suggest that small changes in intrinsic dynamics in tandem with differences in synaptic connectivity can have dramatic effects on the tuning of auditory neurons.

Author summary

Experience-dependent developmental plasticity involves changes not only to synaptic connections, but to voltage-gated currents as well. Using biophysical models, it is straightforward to predict the effects of this intrinsic plasticity on the firing patterns of individual neurons, but it remains difficult to understand the consequences for sensory coding. We investigated this in the context of the zebra finch auditory cortex, where early exposure to a complex acoustic environment causes increased expression of a low-threshold potassium current. We simulated responses to song using a detailed biophysical model and then characterized encoding properties using generalized linear models. This analysis revealed that this potassium current has strong, nonlinear effects on how the model encodes the song’s temporal structure, and that the sign of these effects depend on the temporal tuning of the synaptic inputs. This nonlinearity gives intrinsic plasticity broad scope as a mechanism for developmental learning in the auditory system.


Neurons have diverse, nonlinear dynamics. Many brain regions contain multiple kinds of neurons with different spike waveforms and spiking patterns [13], and there is substantial variation even within well-defined cell types [46]. Intrinsic dynamics can be modified by activity and experience [79], which may be an important mechanism for learning [10]. This physiological diversity has been known for many decades [11] and can be modeled on a detailed, biophysically realistic level [12, 13], but our understanding of how intrinsic dynamics affect neural computations in many systems has remained surprisingly qualitative.

The complexity and nonlinearity of biophysical models makes it difficult to use them to explain higher-order processes in the brain, at what Marr [14] termed the algorithmic and computational levels. A simple, single-compartmental model that can produce common physiological behaviors like bursting, adaptation, or rebound spiking, is a system of around ten or more nonlinear differential equations, with fifty or more parameters [15, 16]. These parameters correspond to specific aspects of the cell biology (such as membrane capacitance or sodium channel density), which makes them easy to interpret and, in some cases, possible to measure directly. However, the relationships between the parameters and the observable behaviors of the neuron are highly nonlinear, making it difficult to constrain them statistically. It is difficult and time-consuming to fit dynamical models to biological data [1720], and there is little consensus on the appropriate methods or even whether there are globally optimal solutions [21]. Moreover, access to the intracellular voltage is needed, through a sharp or patch electrode or using an optical sensor [22], which greatly limits the number of neurons that can be modeled within the context of a circuit, and almost always requires the use of ex vivo preparations that cannot be presented with realistic stimuli.

As a consequence, many studies of function in neural systems have emphasized phenomenological models that omit most of the biophysical and dynamical features of spike generation in exchange for computational tractability [2326]. One of the simplest examples is the generalized linear model (GLM), which represents spiking as an inhomogeneous Poisson process with a conditional intensity that depends only on a linear function of the stimulus and spiking response in the recent past [27]. In contrast to more realistic models, the GLM is a staple of statistics, with a well-defined likelihood function that is concave everywhere, guaranteeing that a global optimum can be found [28]. The GLM also has established techniques for regularization, which is necessary when stimuli have naturalistic (i.e., highly correlated) distributions [29, 30].

Because of its simplicity and probabilistic formulation, a GLM can be thought of as a representation of a neuron’s encoding properties; that is, an abstract view of how the cell transforms sensory stimuli into spike trains. Surprisingly, although GLMs have been successfully used to model encoding in a number of different sensory systems [27, 31], and there have been several studies using GLMs to predict and characterize more complex spiking models [3234], to our knowledge there has not been any attempt to relate the GLM to more detailed, dynamical models with realistic sensory inputs. As a result, it is difficult to predict how natural, pathological, or experience-dependent variations in voltage-gated channels are likely to affect sensory processing.

In this study, we examined the relationship between intrinsic dynamics and encoding properties in the context of auditory processing in songbirds. Encoding models, including GLMs, have been employed extensively to study this system [31, 3538], but until recently, there have been no data on the intracellular physiology of the constituent neurons. Using whole-cell patch recordings from slices, we have found that the caudal mesopallium (CM), a cortical-level auditory area [39, 40], has diverse, experience-dependent intrinsic dynamics [9, 41]. Most of the putatively excitatory neurons fire repetitively when depolarized, but a substantial fraction only fire at stimulus onsets. This phasic firing behavior is correlated with strong outward rectification that activates at low voltages, and it can be pharmacologically converted to tonic firing by blocking low-threshold potassium currents (KLT). The proportion of phasic neurons changes over development, reaching a peak around the age zebra finches begin to memorize songs, but only in birds exposed to a complex acoustic environment. This experience-dependent plasticity is correlated with changes in the expression of Kv1.1, a low-threshold potassium channel [9].

The dependence of phasic firing on auditory experience suggests that intrinsic plasticity (i.e., a change in the expression or properties of voltage-gated currents, rather than synaptic currents) plays a critical role in development for songbirds, but for all the reasons noted above, the functional significance remains unclear. Here, we took a simulation-based approach to ask how changing the magnitude of low-threshold potassium currents in a dynamical model would affect encoding properties, as estimated with a GLM. We simulated auditory responses using a linear-dynamical cascade model [42], which combines a linear spectrotemporal receptive field (RF) with a single-compartment biophysical model (Fig 1A). The linear stage of the model consists of representative RFs based on the data and parametric model of Woolley et al. [38], which are convolved with spectrograms of zebra finch song to generate an external driving current. Conceptually, this current represents a linear approximation of the summation and filtering performed by the neuron’s dendrites on excitatory and inhibitory synaptic inputs. The biophysical model we used includes sodium, high-threshold potassium, transient (A-type) potassium, low-threshold potassium, and hyperpolarization-activated (h-type) currents, and it can reproduce the responses of phasic and tonic CM neurons to step and broadband current stimuli [41]. As shown previously, phasic firing in this model depends on a single parameter that governs the maximal conductance of the low-threshold potassium current (gKLT) (Fig 1B and 1C). We used the spike trains produced by these simulations to fit GLMs (Fig 2) and then compared estimates for the RF and spike-history parameters to determine how KLT influenced how the model was encoding the acoustic structure of the stimulus.

Fig 1. Linear-dynamical cascade model.

(A) The linear stage of the model consists of the convolution of a stimulus with a receptive field. The output of the convolution (Dstim(t)) is combined with a stimulus-independent noise signal (Dnoise(t)) with a 1/f spectral distribution. The sum of Dnoise(t) and Dstim(t) is converted to the input current I(t) using a static nonlinearity, ensuring that the model voltage remains within biologically realistic bounds. (B) I(t) enters into the biophysical stage, which models membrane voltage dynamics as a system of ordinary differential equations. (C) The model is numerically integrated to produce a simulated voltage trace. Multiple trials are simulated by keeping Dstim(t) the same from trial to trial, while drawing new values for Dnoise(t).

Fig 2. Schematic of parameter estimation for generalized linear model.

The data to be fit comprise a stimulus, which can be a univariate time series or a multivariate spectrogram (as shown here), and a spiking response. The model represents the response as an inhomogeneous Poisson process with a conditional intensity that depends on the convolution of the stimulus with a receptive field (K) and the convolution of the response with a spike-history filter, which was parameterized as the sum of two exponential decays representing short-term (α1) and long-term (α2) adaptation or facilitation. Not shown is a constant offset ω, which governs the baseline probability of firing, such that higher values suppress the probability of spiking. These model parameters are estimated by regularized maximum likelihood.


Univariate white-noise stimulus

As a proof of principle, we began with an example using a white-noise stimulus drawn from a univariate Gaussian distribution. The absence of temporal correlations in this stimulus is ideal for obtaining unbiased estimates of the GLM parameters, allowing us to determine how intrinsic dynamics affect encoding in a best-case scenario.

We generated data for fitting the GLM by providing 100 s of white noise as input to two linear-dynamical cascade (LDC) models that had the same RF but different dynamics. The dynamical stage of the model was based on our previous work in the zebra finch caudal mesopallium [41, 42]. The tonic model lacks KLT and has a higher capacitance, whereas the phasic model includes KLT and has a lower capacitance (see Methods for parameter values). These models reproduce the responses to step currents (Fig 3A) and broadband currents seen in slices. Both LDC models produced similar responses to the white noise stimulus, but the phasic model tended to have narrower peaks of activity (Fig 3B and 3C).

Fig 3. GLM estimates for exemplar tonic and phasic models with univariate white-noise stimulus.

(A) Voltage responses of tonic and phasic models to high- and low-amplitude injected current steps (shown in bottom row). The tonic model exhibits depolarization block to strong currents but fires repetitively to weak currents, whereas the phasic model only fires a single spike to all suprathreshold current levels. (B) Top, response of the tonic dynamical model to a white-noise stimulus. The input RF is shown in D. Middle, raster plots of spike times from 10 trials with the same stimulus but varying Inoise(t). Black ticks correspond to the output of the dynamical model and colored ticks are the predictions of a GLM fit to a different set of data from this model. Bottom, spike rate histograms (bin size = 10 ms) for 50 trials from the dynamical model (black) and the GLM (yellow). Only a subset of the full test data is shown. (C) Like B, but for the model with phasic dynamics. The stimulus, RF, and noise level were the same. (D) Estimated RFs from the GLMs compared to the input RF of the dynamical model. To indicate posterior uncertainty in the estimates, individual samples from the MCMC sampler are shown in light gray, and the median is overlaid in color. (E) Posterior distributions of baseline firing rate (ω) and spike-history filter parameters (α1 and α2). The top panels in each column show marginal distributions for individual parameters, and the panels in the lower left corner show joint distributions for each pair of parameters. Note that more positive values of α1 and α2 correspond to stronger adaptation (i.e., a negative correlation with past spiking).

In general, parameter estimates are only interpretable to the extent that the model is a good fit to the data. We checked the goodness of fit by comparing the responses of the LDC model and the fitted GLM to a new white-noise stimulus. The output of the GLM was an excellent prediction of the dynamical model’s response (Fig 3B and 3C). Indeed, the correlations between the average firing rates for LDC data and GLM prediction (tonic: r = 0.96; phasic: r = 0.84) were comparable to the correlations between average rates of even and odd trials in the data (tonic: r = 0.94; phasic: r = 0.90)—as good as could be expected given the intrinsic variability of the data. Thus, at least for white-noise stimuli, the linear spike-history filter and static nonlinearity of the GLM can closely approximate the dynamical nonlinearity of a single-compartment biophysical model. This allows us to interpret the GLM parameters as meaningful descriptions of the encoding properties of the more complex model.

The LDC and GLM both have receptive fields that are convolved with the stimulus to produce a signal that modulates the probability of spiking. When a GLM is fit using data from an LDC model, we expect the estimated RF to resemble the RF used to generate the data, but not exactly. Indeed, differences between the input and estimated RFs will reflect the effects of the intrinsic dynamics. One expected effect is from the filtering properties of the membrane. In the GLM, firing probability depends on a static, exponential function of the convolved stimulus (Fig 2). In the LDC model, the output of the convolution enters as a current that contributes linearly to the derivative of the membrane voltage. The capacitance and conductance of the membrane act as an additional, lowpass filter, so we would expect the estimated RF to be a lowpass-filtered version of the input RF. In the time domain, the effect of the membrane would be to stretch the RF out in time. In fact, what we observed was that the estimated RFs were either very close to the input RF (Fig 3D, top) or compressed in time (Fig 3D, bottom), corresponding to a relative boosting of higher frequencies. This would not be possible for a model with a purely passive membrane; therefore, it must be the active, voltage-gated currents that are shifting the model’s temporal encoding properties. This temporal distortion, which is consistent with the bandpass characteristics of KLT [41, 43], will be explored further in subsequent analyses.

Intrinsic dynamics also affected the spike-history filter. Unlike the RF, the parameters for the spike-history filter do not correspond to specific parameters in the LDC model; however, we expect them to reflect the effects of currents that are activated by spiking. As seen in Fig 3E, the spike-history filter was stronger on both short (α1) and long (α2) timescales for data from the tonic model compared to the phasic one. The posterior uncertainty in these parameter estimates was low compared to the difference between dynamical models. This means that the spiking patterns produced by phasic and tonic cells are sufficiently different, at least for this kind of stimulus and amount of data, to observe changes largely caused by a single biophysical parameter.

Multivariate birdsong stimulus

Having demonstrated that the GLM can be used to analyze the encoding properties of a dynamical model, we turned to a more realistic scenario using natural birdsong as the stimulus. The dynamics remained the same as in the white-noise case, but the linear stage was replaced with a spectrotemporal RF. The stimulus, which consisted of 40 s of song from multiple zebra finches, was converted to a spectrogram and convolved with the RF, summing across spectral channels. This produced a univariate time series that entered into the dynamics as an external current.

We used RFs that were representative of the diversity found in cortical-level auditory neurons. RF structure can be analyzed in terms of the modulation transfer function (MTF), a 2-D Fourier transform of the RF that shows its joint spectral and temporal tuning [44]. Most of the neurons in the zebra finch primary auditory pallium have MTFs with power along either the spectral or temporal axis, indicating that they can be tuned to narrow spectral bands or to rapid modulations of the temporal envelope, but only rarely to both [38]. This distribution is similar to the modulation spectrum of zebra finch song [44] and at least partly reflects the statistics of early auditory experience [45]. Here, we simulated responses using 60 synthetic RFs drawn from this distribution [38]. Each RF was combined with the tonic and phasic dynamical models, so that we could quantify the effects of KLT across RF types and determine if there was any interaction with RF structure.

As before, the simulated responses were used to estimate GLM parameters, but with two modifications that were necessitated by the statistics of the birdsong stimuli. Like many other natural stimuli, the amplitude envelope of birdsong is dominated by low frequencies [46]. For our cascade model, these low-frequency temporal modulations result in long intervals when I(t) is strongly positive or negative, which in turn tends to drive the model to unrealistic voltage levels far outside the range that would be expected from the reversal potentials of typical synaptic channels. To address this issue, we introduced a compressive static nonlinearity that constrained the output of the convolution to biologically feasible values (see Methods). The second issue with stimuli dominated by low frequencies is a statistical one. As has been known for some time [29, 35], estimating the parameters of receptive field models when the stimulus is highly autocorrelated can lead to numerical instability and overfitting. To address this issue, we used elastic-net regularization when estimating GLM parameters (see Methods).

We begin by examining three examples representative of the distribution. As will be seen, the temporal characteristics of the input RF have a consistent effect on encoding properties, so we have denoted these three examples in terms of their temporal modulation transfer functions (tMTFs): wideband (WB), bandpass-low (BP-L), and bandpass-high (BP-H). These categories reflect two parameters in the equation we used to generate RFs (see Methods). Wideband RFs have a temporal phase (Pt) of zero, which results in only a single excitatory lobe in the temporal profile and broad tuning in the temporal modulation frequency domain. Bandpass RFs have a temporal phase of , resulting in a suppressive/inhibitory lobe. BP-L and BP-H are distinguished by the frequency modulation parameter (Ωt), with lower values corresponding to a broader temporal profile and tuning to slower modulations. As seen in Fig 4A–4F, the fitted GLMs had good predictive performance for both the phasic and tonic models and across all three input RFs, with high correlations between the spike rate histograms produced by the LDC and GL models to a novel birdsong stimulus. Thus, even with many more parameters and an autocorrelated stimulus, the GLM is still a good tool for analyzing the encoding properties of the dynamical models.

Fig 4. GLM estimates for exemplar tonic and phasic models with zebra finch song stimuli.

(A) Receptive field parameters and responses for a model with tonic dynamics and a spectrally narrowband, temporally wideband RF. Top left, input RF in the LDC model. Top right, estimated RF from GLM. The vertical scale bar denotes 1 kHz and the horizontal 5 ms. Note the temporal smearing and the broad suppression at longer lags in the estimated RF. Middle, examples of spiking responses to zebra finch song from the LDC model (top, black ticks) and the fitted GLM (bottom, red). Bottom, corresponding spike rate histograms (50 trials) for the LDC and GLM (product-moment correlation: rWB = 0.87). (B–C) RFs and responses for models with tonic dynamics and BP-L (B) or BP-H RFs (C), same format as in (A). The GLM accurately predicted the firing rate of the LDC for these parameter values (rBPL = 0.94, rBPH = 0.86). (D–F) RFs and responses for models with the same RFs as in (A–C), but with phasic dynamics (rWB = 0.78, rBPL = 0.90, rBPH = 0.85). All prediction correlations were high considering the underlying spiking variability in the even and odd trials of the LDC (product-moment correlations: tonicWB = 0.92, tonicBP-L = 0.85, tonicBP-H = 0.82; phasicWB = 0.91, phasicBP-L = 0.93, phasicBP-H = 0.91). More detailed plots for each of the six example models can be found in Figs A–F in S1 Text. (G) Temporal MTFs of input RFs, tonic model estimates, and phasic model estimates for each of the three input RFs. Power is normalized relative to the peak for each spectrum. The change in power at low frequencies, quantified as Δl (see Methods) was –0.08, 0.44, and –0.15 for tonic models and –0.03, 0.30, and –0.33 for phasic models. (H) Posterior distributions of α1 and α2 comparing dynamical models for each RF.

As with the white-noise case, the estimated RFs were qualitatively similar to the input RFs, but with distortions in the temporal profile. Most of the estimated RFs appeared to be smeared in time and with stronger and longer suppressive periods. Some of the distortions were consistent across tonic and phasic models, but there were also differences between the two dynamical models that reflect the effects of KLT. We analyzed these effects by looking at the tMTFs, which are calculated by summing the 2D Fourier transform of the RFs across the spectral dimension (Fig 4G). These plots show how well the model neuron is able to encode temporal modulations in the stimulus as a function of frequency. All of the estimated RFs were tuned to frequencies below 100 Hz, which is about the fastest temporal modulation rate found in zebra finch song [46]. Although some of the input RFs had the potential to represent faster modulations, these frequencies were attenuated in the estimated RFs, probably because of the passive filtering properties of the membrane and the statistics of the stimulus. The main differences between the dynamical models were in the attenuation of low frequencies. Strikingly, the effects of the dynamics on lowpass attenuation varied across RFs. For the WB input, the estimated tMTF was more bandpass in the phasic model compared to the tonic model, while the opposite was true for the BP-L and BP-H inputs. Thus, not only does KLT change the temporal encoding properties of the neuron, but this effect is different depending on the filtering properties of the inputs (i.e., the input tMTF).

The posterior distributions for the spike-history parameters were broader than for the white-noise examples (Fig 4H), indicating that the estimates are more poorly constrained by the data. This was expected, given that the stimulus was shorter and more correlated. Nevertheless, there was essentially no overlap between the posterior distributions for the tonic and phasic versions of any of the example models, indicating that the GLM spike-history parameters were sensitive to the biophysical dynamics. Furthermore, as the next section will show, the trends in these examples were consistent across the larger sample of RFs.

As with the RF temporal structure, the spike-history filter parameters were affected by the interaction of RF type and dynamics. In general, phasic models had stronger short-timescale adaptation than tonic models, as indicated by larger values of α1 (Fig 4F). This effect was in the opposite direction from what we saw in the white-noise case (Fig 3E), where tonic neurons had larger values of α1 and α2. This discrepancy presumably reflects differences in the stimulus statistics, because the white-noise example RF was qualitatively similar to the temporal profile of the example RFs. As has been reported previously, neuron models fit to white-noise stimuli produce poor predictions to natural stimuli [29]. The white-noise GLMs produced good predictions because they were fit and tested with white-noise stimuli, but the parameter estimates do not generalize to other kinds of stimuli. As noted above, a key feature of birdsong is that the temporal envelope is dominated by low frequencies. These slow oscillations produce sustained periods of excitation or inhibition that drive the dynamical model into regimes where adaptive processes come more strongly into play. This nonlinear interaction between stimulus statistics and dynamics likely also explains why the effect of KLT varied across the example RFs: phasic dynamics (i.e., increased KLT) caused α1 to increase for all three RFs, but only affected α2 for the BP-L RF.

Interaction of intrinsic dynamics and RF temporal filtering

Based on these examples, we hypothesized that the key contributor to these interactions was the temporal profile of the input RF, in particular whether there was a negative lobe at longer lags. In the modulation frequency domain, this lobe corresponds to bandpass filtering. The parametric, Gabor-based model we used to generate the RFs [38] represents this feature by a single parameter, the temporal phase (Pt), which is 0 for the WB example and for the BP-L and BP-H examples. Approximately half (26/60) of the RFs in our larger sample, those with modulation power primarily along the spectral axis, had Pt of 0, whereas the RFs with power along the temporal modulation axis (34/60) had Pt of .

The performance of GLMs fit to data from the larger set of RFs was consistently good, with high correlations between the spike-rate histograms of the LDC and GL models for the tonicWB (r = 0.86 ± 0.04), tonicBP (r = 0.90 ± .04), phasicWB (r = 0.75 ± 0.08), and phasicBP (r = 0.87 ± 0.05) groups, that were comparable to the correlations between the even and odd trials of the LDC data for the tonicWB (r = 0.93 ± 0.01), tonicBP (r = 0.84 ± 0.03), phasicWB (r = 0.92 ± 0.02), and phasicBP (r = 0.90 ± 0.02) models. Performance was slightly lower for the phasicWB data, but the reason for this was not clear.

The results from the larger sample of RFs were consistent with our hypothesis. We looked first at the effects of dynamics on RF temporal structure, specifically the extent to which the estimated tMTF (which represents how the full LDC model encodes stimuli) was attenuated at low frequencies compared to the input tMTF (Δl). In Fig 4G, Δl corresponds to the difference between the black line and blue or yellow line at f = 0 with maximum power set to 1. Positive values of Δl indicate that the estimated RF is more bandpass (i.e., responds less to low-frequency modulations) compared to the input RF. Negative values indicate that encoding of lower frequencies is boosted. As shown in Fig 5, for models with WB temporal tuning, phasic dynamics attenuated low frequencies, in comparison to the matching tonic models (LMM: b0 = 0.02, b1 = −0.11, n = 52). For neurons with BP temporal tuning, the effect was the opposite: phasic dynamics caused low frequencies to be less attenuated compared to the matching tonic models (b0 = −0.05, b1 = 0.15, n = 68). In other words, across a broad range of RFs, KLT consistently causes neurons with broadly tuned inputs to become more selective for higher-frequency features, but causes neurons that already have narrowly tuned inputs to become more responsive to lower frequencies.

Fig 5. Phasic dynamics attenuate low-frequency modulations for temporal wideband RFs but enhance them for bandpass RFs.

Lowpass attenuation was defined as the difference in the ratios between the power at f = 0 and the peak power of the temporal modulation spectrum (as in Fig 4H) of the input RF and GLM estimated RF (Δl; see Methods). The y-axis shows the difference between this value for the input RF and the estimated RF. Positive values indicate that the estimated RF is more bandpass in its temporal filtering properties compared to the input RF, while negative values indicate the estimated RFs were more lowpass. For each RF, lowpass attenuation estimates for the phasic and tonic models are connected by a black dotted line. The bold dotted line shows the differences in the mean lowpass attenuation estimates (enlarged black dot) between RF types for a given model. The linear mixed effects model (LMM) with the interaction between RF type and dynamics fits significantly better than the LMM with main effects only (LMM: χ2(1) = 19.04, p < 0.001).

Similarly, just as we saw with the example models, the adaptation parameters also depended on RF temporal structure and dynamics. As shown in Fig 6A, the general trend was for phasic models to have lower spontaneous firing rates and stronger adaptation, but there were some differences in the effect of phasic dynamics on α2 that depended on RF type. Models with phasic dynamics had lower baseline firing rates (larger values of ω; Fig 6B) compared to tonic models (LMM: b0 = 9.08, b1 = −1.29, n = 120), and models with WB RFs had lower baseline rates compared to models with BP RFs (b2 = −2.37, n = 120, Fig 6B). Similarly, models with phasic dynamics had stronger short-term adaptation (α1; Fig 6C) compared to tonic models (b0 = 196.77, b1 = −150.99, n = 120), and models with BP RFs had stronger adaptation than models with WB RFs (b2 = 1.21, n = 120). For both of these parameters, there was not a significant interaction between model dynamics and RF type. However, there was an interaction for longer-timescale adaptation (α2; Fig 6D). For WB RFs, α2 was larger for phasic models compared to tonic models (b0 = 0.29, b1 = −0.49, n = 52), but for BP RFs, α2 was larger for tonic models (b0 = −0.48, b1 = 0.19, n = 68). Note that in contrast to the white-noise example, α2 estimates were sometimes negative, which corresponds to a baseline facilitation (i.e., past spikes are associated with an increased probability of firing).

Fig 6. Firing rate and spike-history parameter estimates depend on RF structure and dynamics.

(A) Point estimates of ω, α1, and α2 GLM parameters for phasic (blues) and tonic (yellows) models by RF type. Across the diagonal are the marginal distributions for each of the parameters, with the joint distributions on the off-diagonal. (B) Strip plot of parameter estimates showing paired phasic and tonic models (as in Fig 5). For each RF, the phasic and tonic model parameter estimates are connected by a black dotted line. The bold dotted lines show the differences in the mean parameter estimates between RF types for a given model. The LMM with main effects and an interaction was a significantly better fit than an LMM with main effects only for α2 (χ2(1) = 72.00, p < 0.001), but not for ω (LMM: χ2(1) = 0.38, p = 0.54) or α1 (χ2(1) = 0.08, p = 0.78).

Nonlinear, nonmonotonic effects of KLT on encoding properties

Up to this point, intrinsic dynamics have been dichotomized into tonic and phasic firing. For step currents, this dichotomy reflects a bifurcation in the dynamics: below a critical value of gKLT, spiking is repetitive, but above this value, it occurs only at the stimulus onset [15, 43]. For broadband current stimuli, however, the effects of gKLT are more graded [41]. To test whether KLT affects encoding properties in a continuous or binary manner, we simulated responses using LDC models with values of gKLT that varied in steps of 1 nS over a range of 0 to 50 nS (with capacitance kept constant at 60 pF), which encompasses the bifurcation in this model from tonic to phasic firing. For simplicity, we used only the three example receptive fields shown in Fig 4 (WB, BP-L, and BP-H). Using the same birdsong stimulus, we fit GLMs to data from these simulations and examined how lowpass attenuation and adaptation were affected.

The correlation between even and odd trials of the simulated data tended to increase with gKLT (Fig 7A), which is consistent with our previous finding that KLT makes spike timing more precise and less variable across trials [42]. In contrast, although the performance of the GLM was good across all levels of gKLT (Fig 7B), it tended to decrease with larger gKLT values. This suggests that the LDC model is more difficult to approximate with a GLM as additional voltage-gated conductances are added. Overall, the predicted spike trains remained highly accurate, allowing resulting parameter estimates to be meaningfully interpreted.

Fig 7. Effects of low-threshold potassium conductance (gKLT) on GLM parameters are nonlinear and depend on RF structure.

(A) Correlation coefficients between the even and odd trials of the LDC model as a function of gKLT for the three exemplar RFs. (B) Correlation coefficients between the spike-rate histograms of the LDC and GL models as a function of gKLT. (C–F) Lowpass attenuation, ω, α1, and α2 estimates as a function of gKLT.

Consistent with what we observed with dichotomized dynamics, the effects of KLT on RF temporal structure, spontaneous firing rate, and adaptation depended on RF type (Fig 7C–7F). With the exception of spontaneous firing rate (Fig 7D), the trajectories of the parameters as gKLT increased were nonlinear although approximately monotonic. However, there was little evidence of bifurcation, which would have appeared as a sharp discontinuity between two stable regimes. These results confirm that the effects of intrinsic dynamics on encoding properties are highly nonlinear, with a strong dependence on the statistics of the stimulus and the tuning of the inputs.


These data demonstrate how intrinsic dynamics can affect the temporal encoding properties of cortical-level auditory neurons. Although this effect is not unexpected, to our knowledge it has not yet been quantitatively characterized. Our approach was to simulate zebra finch auditory responses with a biophysically realistic linear-dynamical cascade model and then estimate encoding properties using GLMs, which are statistically robust and easy to interpret. This allowed us to modulate intrinsic dynamics by changing the parameter values that correspond to specific cellular mechanisms and explore the effects on receptive fields and spike-history adaptation.

We focused on a low-threshold potassium current (KLT), which is expressed in a subset of neurons in zebra finch CM. In a previous study, we used broadband current injections to show that KLT affects temporal integration, causing neurons to become more coherent with inputs at frequencies around the maximum temporal modulation rate of zebra finch song [41]. This effect is reproduced by the dynamical model used here. However, the current stimuli used to build the model were artificial and unrepresentative of the stimulus-driven synaptic activity CM neurons would receive in vivo. Thus, to predict how variation in KLT might affect auditory responses to vocal communications in this species, we drove the dynamical model with an injected current that was the result of convolving natural zebra finch song with a spectrotemporal RF, which we term the “input RF”. Input RFs, which represent a linear approximation of the processing performed by the neuron’s presynaptic partners and the dendritic integration of excitatory and inhibitory synaptic currents, were randomly drawn from a published distribution of RFs found in zebra finch Field L [38], the major source of ascending auditory input to CM [39, 47]. This allowed us to predict which effects of the dynamics would be consistent across the population and which would depend on tuning of the inputs.

KLT has a nonlinear influence on how neurons encode stimuli

The estimated RFs, which we interpret as the features of the stimulus that neurons encode in their spiking outputs, reflected the statistics of the stimulus, the filtering properties of the input RFs, and the dynamics of spiking. Estimated RFs qualitatively resembled input RFs but were distorted in time. Analyzing these distortions using temporal modulation transfer functions (Fig 4), we found that most (71/120) of the model neurons were less responsive to high frequencies (≥ 100 Hz) than their inputs; we expected this effect from the lowpass filtering associated with passive leak currents. KLT, in contrast, primarily affected low frequencies in the tMTF. To our surprise, the sign of the effect depended on the input tMTF, specifically how broadly tuned it was. Wideband tMTFs became more bandpass, with stronger attenuation at low frequencies. Bandpass tMTFs, however, became more lowpass, indicating that KLT was effectively boosting responses to low frequencies in the stimulus.

This result is somewhat counterintuitive, but it is consistent with the high degree of nonlinearity phasic neurons exhibit for low-frequency inputs. Using slice recordings, we previously showed that phasic and tonic CM neurons differ in their coherence between current input and spiking output [41], with phasic neurons exhibiting lower coherence than tonic neurons for frequencies below about 20 Hz. Because ideal linear time-invariant systems have coherence values equal to unity for all frequencies [48], this result indicates that phasic neurons are more nonlinear at low frequencies, but not the sign or magnitude of the nonlinearity (contra our interpretation in that study). In other words, for some stimuli phasic neurons may boost low frequencies while for other stimuli they may attenuate low frequencies. This is precisely the effect we observed here.

KLT has a nonlinear influence on how neurons adapt to prior activity

KLT also affected the spike-history filter component of the GLM. Here the effects were more consistent across RF types, though there was a weak but significant interaction for long-term adaptation (α2), such that WB neurons became more strongly adapting with phasic dynamics and BP neurons became more facilitating (Fig 6B). Within the joint distribution of all the spike-history parameters (ω, α1, and α2; Fig 6A), there was there was a clear visual separation in the population distributions of tonic and phasic neurons, such that one could potentially infer whether a cell was tonic or phasic from the spike-history parameters alone. Thus, under some circumstances it may be possible to use extracellular recordings to characterize intrinsic dynamics.

When dynamical neuron models are stimulated with step currents, gKLT is a bifurcation parameter with a critical value that determines whether the cell can spike repetitively (tonic firing) or not (phasic firing). We found that using more realistic currents, there is little evidence of bifurcation in encoding properties, which changed smoothly as we varied gKLT (Fig 7). These relationships nonetheless tended to be quite nonlinear, indicating that neurons can in principle achieve dramatic changes in functional response properties with only small changes in the expression or localization of a single type of channel.

Functional implications of KLT expression in the avian auditory system

Taken together, these results demonstrate that the encoding properties of auditory neurons can be highly sensitive to changes in intrinsic dynamics arising from the inclusion or exclusion of a single current. We recently showed that CM neurons express more Kv1.1 and become more phasic during the peak of the critical period for song memorization, but only in finches raised in the complex acoustic environment of a colony [9]. As suggested by our results here, increased expression of a low-threshold potassium channel like Kv1.1 might help neurons to filter out this kind of background noise by selectively suppressing responses to low-frequency inputs in neurons that have broad temporal tuning. Such a mechanism could explain the recent finding that in rats, exposure to dynamically modulated noise causes neurons in the primary auditory cortex to shift their tuning away from the spectrotemporal modulation frequencies of the noise [49]. In this respect, KLT may be serving an analogous function to the co-tuned feedforward inhibitory inputs seen in mammalian auditory cortex [50, 51], but without the need for a separate population of neurons. As we have speculated elsewhere, a cell-intrinsic mechanism for filtering out background noise and increasing spike precision may be an important complement to synaptic plasticity early in development when inhibitory circuits and the reversal potential of inhibitory conductances are still stabilizing [9].

It is less clear to us why it would be useful for KLT to boost low-frequency responses in neurons that already have bandpass-tuned inputs; however, we note that this effect was considerably more variable (compare the variance for BP and WB neurons in Fig 5). Moreover, it is not yet known if the distribution of KLT expression in CM is independent of the distribution of input temporal tuning. If expression of KLT depends on experience, and more proximately on the statistics of presynaptic and postsynaptic activity, then its effects may be restricted to neurons with specific tuning properties. Intracellular recordings to measure excitatory and inhibitory RFs in CM neurons may be needed to determine if this is the case.

Model-based approaches to understanding how nonlinear mechanisms affect sensory processing

This study complements other efforts to incorporate biologically realistic mechanisms into the framework of linear-nonlinear cascade models. Early work in the auditory system demonstrated how static nonlinearities in the summation of RF components alter the encoding properties of stochastic spiking models [52]. More recent studies have added idealized representations of dynamical mechanisms like excitatory and inhibitory conductances [53] or gain adaptation [54] to the linear-nonlinear framework, while retaining the ability to statistically estimate the parameters of these model components and use them to predict biological data. In comparison, our approach emphasizes realism, building on a detailed biophysical model of intracellular voltage dynamics with pharmacologically and (in principle) genetically identifiable components. This realism comes at the cost of statistical tractability. We have addressed this issue by using an entirely different but much simpler model to characterize the encoding properties of the more complex model. Although this limits us to asking empirical questions, there are many biological insights to be gained from an empirical approach.

Within this biophysically realistic framework, our analysis was limited to the effects of manipulating a single biophysical parameter (gKLT) on encoding of a single kind of auditory stimulus (zebra finch song). It is important to note that the nonlinearity of neuronal dynamics means that our results are therefore only valid within the specific context of the other ionic currents in the model. In a different cell type that expresses a different complement of currents, KLT will interact with those currents differently and may have entirely different effects on sensory coding. However, although the results may not generalize broadly, the approach can be adapted widely, to other auditory areas and sensory systems that exhibit diverse or plastic intrinsic dynamics. We have shown that GLMs can accurately predict the spiking responses of more complex, more biophysically realistic models across different kinds of stimuli, receptive fields, and dynamical regimes. Care is needed in interpreting the GLM parameter estimates, which do not correspond to specific cellular mechanisms and are therefore not linear or independent functions of the underlying dynamics. Given the nonlinear kinetics of most voltage-gated currents, we expect that the relationships between intrinsic dynamics and encoding properties will be complex and often counterintuitive in most systems, but that there will be much to learn in each system about how intrinsic dynamics reflect the computational tasks and constraints that need to be solved.


Stimulus design

For univariate white-noise models, the stimulus consisted of 100 s of Gaussian white noise sampled at 1 kHz. For multivariate models, the stimulus consisted of zebra finch song motifs recorded from 30 adult males in our colony. Each motif was normalized to the same RMS amplitude and repeated twice, padding with at least 50 ms microphone noise at the beginning to avoid transients in the convolution. The total duration of the stimulus was 63.7 s, of which 12.7 s was reserved for testing performance. Spectrograms of the stimuli were calculated using a gammatone filter bank [55] with a window size of 2.5 ms and 20 spectral channels between 1.0 and and 8.0 kHz, and a step size of 1.0 ms.

Receptive field construction

The univariate white-noise receptive field was generated from the difference of two gamma functions ()) with time constants of 16 and 32 ms and an amplitude ratio of 1.5. Spectro-temporal receptive fields (RFs) were parameterized as the outer product of two Gabor functions multiplied by a scalar amplitude: (1) where H is the temporal dimension of the RF, G is the spectral dimension, t0 is the latency, f0 is the peak frequency, σt and σf are the temporal and spectral bandwidths, Ωt and Ωf are the temporal and spectral modulation frequencies, Pt is the temporal phase (either 0 or 2π), Pf is the frequency phase (set to 0 for all RFs), and A is the amplitude. The temporal dimension H had a duration of 50 ms with a 1 ms resolution, while the frequency dimension G had 20 channels between 1 and 8 kHz. We generated 60 RFs by sampling randomly from the distributions given in [38] as representative of empirically recorded RFs in primary areas of the zebra finch auditory pallium. The amplitude parameter A was initially set to 1 for all of the RFs, but was adjusted to between 1.5–6 for 8/60 models so that they would fire at least at 1 Hz on average.

Linear-dynamical cascade model

Auditory responses were simulated with a model consisting of a linear, time-invariant stage whose output serves as an external driving current I(t) for a conductance-based, single-compartment dynamical stage [42].

The linear stage consists of a time-invariant receptive field (RF) that is convolved with the stimulus. For the univariate white-noise stimuli, this was a simple 1-dimensional convolution. For the song stimuli, each spectral channel was convolved with the corresponding channel of the RF and the results were summed to produce a univariate time series. In each trial, the output of the convolution Dstim(t) was added to a randomly generated signal Dnoise(t) with a spectral power distribution of 1/f and a signal-to-noise ratio of 4. The total drive D(t) = Dstim(t) + Dnoise(t) was unbounded. For the white-noise stimuli, this was not an issue, and drive was converted to current I(t) with a constant scaling factor. However, for song stimuli D(t) often reached unrealistic values. Because spectral power is always positive, RFs with lowpass temporal characteristics tended to over-drive the neurons with long periods of net positive current. Given that excitation and inhibition are generally balanced in the mammalian auditory cortex [50], and that synaptic currents in biological neurons are limited by the reversal potentials of sodium, potassium, and chloride, for song stimuli we therefore mean-centered D(t) and compressed the resulting drive to obtain a more realistic current I(t): (2) where U and L are the upper and lower bounds of input current respectively and free parameters b and a control the slope and intercept of the logistic curve. U and L were calculated based on the passive membrane properties of the model such that the model would not be driven above 0 mV or below −100 mV, resulting in U = 97.5 pA and L = −32.5 pA. The free parameters were estimated by minimizing the mean squared error between Eq 2 and the identity function rectified at U and L to give b = −0.04 and a = 1.32. We also ran all the analyses without mean-centering and compression. The results were qualitatively similar, indicating that the model is robust to assumptions about the strength of the driving current. However, we only report the results from the simulations with mean-centering and compression due to their increased biological realism.

The voltage dynamics were based on a model of dorsal cochlear neurons [15] adapted for tonic and phasic CM neurons by Chen and Meliza [9]. The component currents include an external driving current I(t) and six intrinsic currents. (3) (4) (5) (6) (7) (8) (9)

Each voltage-gated current depended on a maximal conductance gX, the reversal potential for the ion species conducted by the channel EX, and one or more gating variables (e.g., mX, hX). For all currents, the dynamics of the gating variables were defined by first-order kinetics; for example, (10)

This model can produce phasic or tonic responses to step currents depending on the value of gKLT. When gKLT is low, the model neuron produces sustained responses to weak and moderate depolarizations; when gKLT is high, the model only fires at the onset of the current step. The principal model parameter values used here are shown in Table 1 (see [42] for a complete list). Each RF was paired with a tonic and a phasic model. To examine how encoding properties change over the full range of gKLT values, we started with the tonic model parameters and increased gKLT from 0 nS to 50 nS in steps of 1 nS.

The dynamical model simulation code was generated using spyks (; version 0.6.10), and the dynamics were integrated using a 5th-order Runge-Kutta algorithm with an adaptive error tolerance of 1 × 10−5 and an interpolated step size of 0.025 ms. The output of the integration was converted to spike times by thresholding the voltage at -20 mV.

Generalized linear models

A generalized linear model (GLM) [27, 31] was fit to the spike trains produced by the linear dynamical cascade models (Fig 2). The conditional intensity of the model was given by: (11) where λ(t) is the conditional intensity at time t, exp(−ω) corresponds to the baseline firing rate of the GLM, K is the RF, which is convolved with the song spectrogram x, and h is the spike adaptation filter, which is convolved with the spike train history yhist(t). Note that we use f1 * f2(t) to denote the convolution of two functions with respect to time. The full RF was a 20 × 50 matrix (20 spectral channels by 50 time bins of 1 ms). To reduce the number of parameters and avoid overfitting, K was parameterized with a rank-2 approximation; that is, the product of a 20 × 2 spectral filter and a 2 × 50 temporal filter [56]. The parameter count in the temporal dimension was further reduced by projecting into a basis set consisting of 12 raised cosine functions [27]. This basis set achieves good temporal resolution in the time immediately following a spike, with the resolution smoothly decreasing at long time intervals. The spike-history filter h was parameterized in a basis set of two exponential functions: (12) where τ1 and τ2 are time constants corresponding to short (10 ms) and long (200 ms) timescales, and α1 and α2 are the coefficients. This parameterization was chosen based on the multiadaptive timescale model, which is closely related to the GLM and has been shown to be capable of reproducing a broad range of intrinsic dynamics [25, 57].

The GLMs were fit to data from the first 80% of the stimuli. The log-likelihood function of the GLM is given by (13) where ti is the time of the ith spike, n is the number of spikes in the experiment, T is the final time point of the experiment, and θ represents the free parameters [58]. Because the stimulus is highly correlated and the RF is expected to be sparse, we used elastic-net regularization to constrain the RF parameter estimates. Elastic-net regularization is combination of ridge regression and the least absolute shrinkage and selection operator (LASSO). Ridge regression introduces an L2 penalization parameter (ν2) to account for multicollinearity, which is inherently present in the highly correlated structure of the song spectrogram. The LASSO introduces an L1 penalization parameter (ν1) to shrink small correlations to zero and acts as a feature selection algorithm, enforcing RF sparseness. A cost function was given by: (14) where ‖k1 and ‖k2 are the L1-norm and L2-norm of K (reshaped into a 1-D vector), respectively. Since the log-likelihood function is concave and is guaranteed to be free of local maxima [28], we simultaneously estimated the parameters (ω, K, α1, α2) by minimizing the cost function, which was done by using the nonlinear conjugate gradient method scipy function ‘fmin_ncg’ (version 1.3.0) [59]. Theano (version 1.0.4) [60] was used to symbolically derive the gradient and Hessian of the cost function and dynamically generate C code to evaluate them. The regularization coefficients (ν1, ν2) and the factorization rank D were chosen using 4-fold cross-validation on the estimation data.

We quantified the uncertainty in the maximum-likelihood estimates (ω, K, α1, α2) by sampling from the joint posterior distribution p(θ|t0, …, tn) ∝ p(θ)L(θ|t0, …, tn) using emcee (version 2.2.1), a Python implementation of an affine-invariant ensemble Markov chain Monte Carlo sampler [61]. The log of the prior probability p(θ) was set to the elastic-net penalty (Eq 14) using the values of ν1 and ν2 obtained through cross-validation, and the log-likelihood was as in Eq (13). An ensemble of 1000 chains was initialized with random values centered around the maximum-likelihood estimate and given a burn-in of 2500–6000 steps. After this period, each chain was sampled one more time to give a set of 1000 independent samples from p(θ|t0, …, tn). For population-level analyses, the final value of the GLM (ω, K, α1, α2) parameters were the median value of their respective posterior distributions due to the symmetric bell-shaped curve of the posteriors. These values were very close to the initial ML point estimates, so we did not sample from the posterior for the analyses shown in Fig 7.

To quantify performance, we generated posterior predictive distributions of spike trains from the fitted GLMs, with time discretized to Δ = 0.5 ms. At such short time scales, the conditional rate λ(t) ⋅ Δ could be approximated as a Bernoulli trial at each time bin which was used to produce spike train responses from the GLMs. In each trial, we drew a sample from the posterior distribution, so the intertrial variability reflects not only the intrinsic variance of the Bernoulli distribution but the uncertainty in the parameter estimates as well. Performance was quantified as the product-moment correlation between the spike-rate histograms (50 trials, 10 ms bins) for the data and the prediction on the 20% of the stimulus reserved for testing. As a baseline measure of intrinsic variability, we calculated the product-moment correlation between even and odd trials in the data (i.e., from the linear-dynamical cascade model); however, we did not explicitly correct performance scores.

Lowpass attenuation

The estimated RF parameters were projected back into a linear time basis and reshaped into a 20 × 50 matrix. To obtain the temporal modulation transfer function (tMTF), a 2-dimensional Fourier transform was performed on the RF, summing across the spectral dimension (including positive and negative frequencies). The Fourier transform was calculated using the numpy package in Python, with zero-padding and the application of a Hanning window in the temporal profile to avoid edge effects. RF lowpass attenuation was quantified as: (15) where P0 is the power for the zero frequency of the input tMTF, Pmax is the maximum power of the input tMTF, is the power at the zero frequency of the estimated tMTF, and is the maximum power of the estimated tMTF. Positive values of Δl indicate that the estimated RF responds more weakly to low modulation frequencies compared to the input RF, whereas negative values indicate that the estimated RF is more responsive to low frequencies.

Linear mixed-effects models

Given the nested, repeated-measures nature of the experimental design (each input RF was used with tonic and phasic dynamical models), we used a random-intercepts LMM with input RF as a random effect. All LMMs were estimated using the lme4 (version 1.1.21) R package, which does not return p-values for parameter estimates due to unreliability issues [62]. To determine statistical significance, we therefore took a model-comparison approach where nested LMMs of increasing complexity were compared against each other. Three candidate models were fit: random effects (variance components) only, random effects and main fixed effects, and random effects with main effects and interactions. Restricted maximum likelihood (REML) parameter estimation gives unbiased LMM estimates, however the LMMs cannot be compared as nested models [63] and we therefore used maximum likelihood estimation (MLE) to generate LMMs. Candidate LMMs were compared across three fit statistics: AIC, BIC, and chi-squared. Lower values of AIC and BIC indicate better relative fit. The null hypothesis of the chi-squared test is that the more complicated model is not a better fit to the data than the less complicated model. See Tables 25 for LMM comparison results.

The variance-components model was given by the equation: (16) where yij is the observed value of the dependent variable for the ith type of neuron model (tonic or phasic) and jth input RF type (WB or BP), b0 is a fixed intercept, uj is the value of the random intercept of the jth RF type, and eij is the error term for the for the LMM. Both uj and eij are assumed to be normally distributed with a mean of zero and a constant variance of and respectively. This LMM essentially tests if the differences we see in the dependent variable are solely due to the random effects of each input RF rather than neuron model or RF type. For all LMM analyses, tonic neuron models and BP RFs were coded as 1, and phasic neuron models and WB RFs were coded as 0.

The main-effects model was given by the equation: (17) where yij, b0, uj, and eij are defined identically as above, b1 is the fixed effect of Mi, the ith neuron model type, and b2 is the fixed effect for Rj, the jth input RF type.

The interactions model is identical to the main-effects model, with the addition of a fixed effect b3 of the multiplicative interaction between neuron model and RF type, with the equation given by: (18)

If the interactions model was found to be the best fit to the data, simple-effects models were estimated using REML since these LMMs were not compared to any other candidate models. Simple effects models were calculated by subsetting the data by RF type and estimating a LMM with RF as a random intercept and neuron model type as a fixed effect. For each RF type, the LMM equation is given by: (19)

Supporting information

S1 Text. Details of GLM estimates for exemplar tonic and phasic models shown in Fig 4A–4F.

The six figures have are in the same order as the examples in Fig 4 and have the same format as each other: (a) Left, input RF in the LDC model. Right, estimated RF from GLM. A novel birdsong stimulus was used to compare GLM performance to output of LDC model, with 5s of the spectrogram shown in (b). Voltage traces of LDC model in response to stimulus for a single trial are shown in (c) with corresponding KLT current (d). Spike trains for all 50 trials are shown in (e), with black corresponding to the LDC model and red to the GLM. PSTHs shown in (f).



We thank Margot Bjoring for assistance in model development and critical feedback, Laura Jamison for suggestions in statistical analysis design, and Jacy Zanussi for thoughtful discussion and critical feedback.


  1. 1. Bal R, Oertel D. Potassium currents in octopus cells of the mammalian cochlear nucleus. J Neurophysiol. 2001;86(5):2299–2311.
  2. 2. Sivaramakrishnan S, Oliver DL. Distinct K currents result in physiologically distinct cell types in the inferior colliculus of the rat. J Neurosci. 2001;21(8):2861–2877.
  3. 3. Ascoli GA, Alonso-Nanclares L, Anderson SA, Barrionuevo G, Benavides-Piccione R, Burkhalter A, et al. Petilla terminology: nomenclature of features of GABAergic interneurons of the cerebral cortex. Nat Rev Neurosci. 2008;9(7):557–568. pmid:18568015
  4. 4. Toledo-Rodriguez M, Blumenfeld B, Wu C, Luo J, Attali B, Goodman PH, et al. Correlation maps allow neuronal electrical properties to be predicted from single-cell gene expression profiles in rat neocortex. Cereb Cortex. 2004;14(12):1310–1327. pmid:15192011
  5. 5. Schulz DJ, Goaillard JM, Marder E. Variable channel expression in identified single and electrically coupled neurons in different animals. Nat Neurosci. 2006;9(3):356–362.
  6. 6. Bomkamp C, Tripathy SJ, Bengtsson Gonzales C, Hjerling-Leffler J, Craig AM, Pavlidis P. Transcriptomic correlates of electrophysiological and morphological diversity within and across excitatory and inhibitory neuron classes. PLoS Comput Biol. 2019;15(6):e1007113.
  7. 7. Ross B, Barat M, Fujioka T. Sound-making actions lead to immediate plastic changes of neuromagnetic evoked responses and induced β-band oscillations during perception. J Neurosci. 2017;37(24):5948–5959.
  8. 8. Daou A, Margoliash D. Intrinsic neuronal properties represent song and error in zebra finch vocal learning. Nat Comms. 2020;11(1):1–17.
  9. 9. Chen AN, Meliza CD. Experience- and Sex-Dependent Intrinsic Plasticity in the Zebra Finch Auditory Cortex during Song Memorization. J Neurosci. 2020;40(10):2047–2055.
  10. 10. Titley HK, Brunel N, Hansel C. Toward a Neurocentric View of Learning. Neuron. 2017;95(1):19–32.
  11. 11. Llinás RR. The intrinsic electrophysiological properties of mammalian neurons: insights into central nervous system function. Science. 1988;242(4886):1654–1664.
  12. 12. Padmanabhan K, Urban NN. Intrinsic biophysical diversity decorrelates neuronal firing while increasing information content. Nat Neurosci. 2010;13(10):1276–1282.
  13. 13. Tripathy SJ, Burton SD, Geramita M, Gerkin RC, Urban NN. Brain-wide analysis of electrophysiological diversity yields novel categorization of mammalian neuron types. J Neurophysiol. 2015;113(10):3474–3489.
  14. 14. Marr D. Vision: A computational investigation into the human representation and processing of visual information, henry holt and co. Cambridge, MA: MIT Press; 1982.
  15. 15. Rothman JS, Manis PB. The roles potassium currents play in regulating the electrical activity of ventral cochlear nucleus neurons. J Neurophysiol. 2003;89(6):3097–3113.
  16. 16. Meliza CD, Kostuk M, Huang H, Nogaret A, Margoliash D, Abarbanel HDI. Estimating parameters and predicting membrane voltages with conductance-based neuron models. Biol Cybern. 2014;108(4):495–516.
  17. 17. Druckmann S, Berger TK, Hill S, Schürmann F, Markram H, Segev I. Evaluating automated parameter constraining procedures of neuron models by experimental and surrogate data. Biol Cybern. 2008;99(4-5):371–379.
  18. 18. Van Geit W, De Schutter E, Achard P. Automated neuron model optimization techniques: a review. Biol Cybern. 2008;99:241–251.
  19. 19. Toth BA, Kostuk M, Meliza CD, Margoliash D, Abarbanel HDI. Dynamical estimation of neuron and network properties I: variational methods. Biol Cybern. 2011;105(3-4):217–237.
  20. 20. Vavoulis DV, Straub VA, Aston JAD, Feng J. A self-organizing state-space-model approach for parameter estimation in hodgkin-huxley-type models of single neurons. PLoS Comput Biol. 2012;8(3):e1002401.
  21. 21. Prinz AA, Bucher D, Marder E. Similar network activity from disparate circuit parameters. Nat Neurosci. 2004;7(12):1345–1352.
  22. 22. Huys QJM, Paninski L. Smoothing of, and parameter estimation from, noisy biophysical recordings. PLoS Comput Biol. 2009;5(5):e1000379.
  23. 23. Keat J, Reinagel P, Reid RC, Meister M. Predicting every spike: a model for the responses of visual neurons. Neuron. 2001;30(3):803–817.
  24. 24. Jolivet R, Lewis TJ, Gerstner W. Generalized integrate-and-fire models of neuronal activity approximate spike trains of a detailed model to a high degree of accuracy. J Neurophysiol. 2004;92(2):959–976.
  25. 25. Kobayashi R, Tsubo Y, Shinomoto S. Made-to-order spiking neuron model equipped with a multi-timescale adaptive threshold. Front Comput Neurosci. 2009;3:9.
  26. 26. Izhikevich EM. Simple model of spiking neurons. IEEE Trans Neural Netw. 2003;14(6):1569–1572.
  27. 27. Pillow JW, Paninski L, Uzzell VJ, Simoncelli EP, Chichilnisky EJ. Prediction and decoding of retinal ganglion cell responses with a probabilistic spiking model. J Neurosci. 2005;25(47):11003–11013.
  28. 28. Paninski L, Pillow JW, Simoncelli EP. Maximum likelihood estimation of a stochastic integrate-and-fire neural encoding model. Neural Comput. 2004;16(12):2533–2561.
  29. 29. Theunissen FE, David SV, Singh NC, Hsu A, Vinje WE, Gallant JL. Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network. 2001;12(3):289–316.
  30. 30. Schwartz O, Pillow JW, Rust NC, Simoncelli EP. Spike-triggered neural characterization. J Vis. 2006;6(4):484–507.
  31. 31. Calabrese A, Schumacher JW, Schneider DM, Paninski L, Woolley SMN. A generalized linear model for estimating spectrotemporal receptive fields from responses to natural sounds. PLoS ONE. 2011;6(1):e16104.
  32. 32. Ostojic S, Brunel N. From spiking neuron models to linear-nonlinear models. PLoS Comput Biol. 2011;7(1):e1001056.
  33. 33. Pozzorini C, Mensi S, Hagens O, Naud R, Koch C, Gerstner W. Automated High-Throughput Characterization of Single Neurons by Means of Simplified Spiking Models. PLoS Comput Biol. 2015;11(6):e1004275–29.
  34. 34. Weber AI, Pillow JW. Capturing the dynamical repertoire of single neurons with generalized linear models. Neural Comput. 2017;29(12):3260–3289.
  35. 35. Theunissen FE, Sen K, Doupe AJ. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J Neurosci. 2000;20(6):2315–2331.
  36. 36. Sen K, Theunissen FE, Doupe AJ. Feature analysis of natural sounds in the songbird auditory forebrain. J Neurophysiol. 2001;86(3):1445–1458.
  37. 37. Nagel KI, Doupe AJ. Organizing principles of spectro-temporal encoding in the avian primary auditory area field L. Neuron. 2008;58(6):938–955.
  38. 38. Woolley SMN, Gill PR, Fremouw T, Theunissen FE. Functional groups in the avian auditory system. J Neurosci. 2009;29(9):2780–2793.
  39. 39. Wang Y, Brzozowska-Prechtl A, Karten HJ. Laminar and columnar auditory cortex in avian brain. PNAS. 2010;107(28):12676–12681.
  40. 40. Jarvis ED, Yu J, Rivas MV, Horita H, Feenders G, Whitney O, et al. Global view of the functional molecular organization of the avian cerebrum: mirror images and functional columns. J Comp Neurol. 2013;521(16):3614–3665. pmid:23818122
  41. 41. Chen AN, Meliza CD. Phasic and tonic cell types in the zebra finch auditory caudal mesopallium. J Neurophysiol. 2018;119(3):1127–1139.
  42. 42. Bjoring MC, Meliza CD. A low-threshold potassium current enhances sparseness and reliability in a model of avian auditory cortex. PLoS Comput Biol. 2019;15(1):e1006723.
  43. 43. Meng X, Huguet G, Rinzel J. Type III Excitability, Slope Sensitivity and Coincidence Detection. Discrete Contin Dyn Syst Ser A. 2012;32(8):2729–2757.
  44. 44. Woolley SMN, Fremouw TE, Hsu A, Theunissen FE. Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds. Nat Neurosci. 2005;8(10):1371–1379.
  45. 45. Moore JM, Woolley SM. Emergent tuning for learned vocalizations in auditory cortex. Nat Neurosci. 2019;22(9):1469–1476.
  46. 46. Singh NC, Theunissen FE. Modulation spectra of natural sounds and ethological theories of auditory processing. J Acoust Soc Am. 2003;114(6):3394–3411.
  47. 47. Vates GE, Broome BM, Mello CV, Nottebohm F. Auditory pathways of caudal telencephalon and their relation to the song system of adult male zebra finches. J Comp Neurol. 1996;366(4):613–642.
  48. 48. Marmarelis VZ. Coherence and apparent transfer function measurements for nonlinear physiological systems. Ann Biomed Eng. 1988;16(1):143–157.
  49. 49. Homma NY, Hullett PW, Atencio CA, Schreiner CE. Auditory Cortical Plasticity Dependent on Environmental Noise Statistics. Cell Reports. 2020;30(13):4445–4458.
  50. 50. Wehr M, Zador AM. Balanced inhibition underlies tuning and sharpens spike timing in auditory cortex. Nature. 2003;426(6965):442–446.
  51. 51. Tan AYY, Zhang LI, Merzenich MM, Schreiner CE. Tone-evoked excitatory and inhibitory synaptic conductances of primary auditory cortex neurons. J Neurophysiol. 2004;92(1):630–643.
  52. 52. Christianson GB, Sahani M, Linden JF. The consequences of response nonlinearities for interpretation of spectrotemporal receptive fields. J Neurosci. 2008;28(2):446–455.
  53. 53. Schinkel-Bielefeld N, David SV, Shamma SA, Butts DA. Inferring the role of inhibition in auditory processing of complex natural stimuli. J Neurophysiol. 2012;107(12):3296–3307.
  54. 54. Ozuysal Y, Baccus SA. Linking the computational structure of variance adaptation to biophysical mechanisms. Neuron. 2012;73(5):1002–1015.
  55. 55. Slaney M. Auditory toolbox. Interval Research Corporation, Tech Rep. 1998;10 (1998).
  56. 56. Thorson IL, Liénard J, David SV. The Essential Complexity of Auditory Receptive Fields. PLoS Comput Biol. 2015;11(12):e1004628.
  57. 57. Yamauchi S, Kim H, Shinomoto S. Elemental spiking neuron model for reproducing diverse firing patterns and predicting precise firing times. Front Comput Neurosci. 2011;5:42.
  58. 58. Rasmussen JG. Lecture notes: Temporal point processes and the conditional intensity function. arXiv:1806.00221v1 [Preprint] 2018. Available from:
  59. 59. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat Meth. 2020;17:261–272.
  60. 60. Al-Rfou R, Alain G, Almahairi A, Angermueller C, Bahdanau D, Ballas N, et al. Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1 [Preprint] 2016. Available from:
  61. 61. Foreman-Mackey D, Hogg DW, Lang D, Goodman J. emcee: the MCMC hammer. Publ Astron Soc Pac. 2013;125(925):306.
  62. 62. Bates D, Mächler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. J Stat Soft. 2015;67(1):1–48.
  63. 63. Pinheiro J, Bates D. Mixed-effects models in S and S-PLUS. Springer Science & Business Media; 2006.