Smoothing of, and Parameter Estimation from, Noisy Biophysical Recordings

Biophysically detailed models of single cells are difficult to fit to real data. Recent advances in imaging techniques allow simultaneous access to various intracellular variables, and these data can be used to significantly facilitate the modelling task. These data, however, are noisy, and current approaches to building biophysically detailed models are not designed to deal with this. We extend previous techniques to take the noisy nature of the measurements into account. Sequential Monte Carlo (“particle filtering”) methods, in combination with a detailed biophysical description of a cell, are used for principled, model-based smoothing of noisy recording data. We also provide an alternative formulation of smoothing where the neural nonlinearities are estimated in a non-parametric manner. Biophysically important parameters of detailed models (such as channel densities, intercompartmental conductances, input resistances, and observation noise) are inferred automatically from noisy data via expectation-maximisation. Overall, we find that model-based smoothing is a powerful, robust technique for smoothing of noisy biophysical data and for inference of biophysical parameters in the face of recording noise.


Introduction
Recent advances in imaging techniques allow measurements of time-varying biophysical quantities of interest at high spatial and temporal resolution. For example, voltage-sensitive dye imaging allows the observation of the backpropagation of individual action potentials up the dendritic tree [1][2][3][4][5][6]. Calcium imaging techniques similarly allow imaging of synaptic events in individual synapses. Such data are very well-suited to constrain biophysically detailed models of single cells. Both the dimensionality of the parameter space and the noisy and (temporally and spatially) undersampled nature of the observed data renders the use of statistical techniques desirable. Here, we here use sequential Monte Carlo methods (''particle filtering'') [7,8]-a standard machine-learning approach to hidden dynamical systems estimation-to automatically smooth the noisy data. In a first step, we will do this while inferring biophysically detailed models; in a second step, by inferring non-parametric models of the cellular nonlinearities.
Given the laborious nature of building biophysically detailed cellular models by hand [9][10][11], there has long been a strong emphasis on robust automatic methods [12][13][14][15][16][17][18][19]. Large-scale efforts (e.g. http://microcircuit.epfl.ch) have added to the need for such methods and yielded exciting advances. The Neurofitter [20] package, for example, provides tight integration with a number of standard simulation tools; implements a large number of search methods; and uses a combination of a wide variety of cost functions to measure the quality of a model's fit to the data. These are, however, highly complex approaches that, while extremely flexible, arguably make optimal use neither of the richness of the structure present in the statistical problem nor of the richness of new data emerging from imaging techniques. In the past, it has been shown by us and others [18,[21][22][23] that knowledge of the true transmembrane voltage decouples a number of fundamental parameters, allowing simultaneous estimation of the spatial distribution of multiple kinetically differing conductances; of intercompartmental conductances; and of time-varying synaptic input. Importantly, this inference problem has the form of a constrained linear regression with a single, global optimum for all these parameters given the data.
None of these approaches, however, at present take the various noise sources (channel noise, unobserved variables etc.) in recording situations explicitly into account. Here, we extend the findings from [23], applying standard inference procedures to wellfounded statistical descriptions of the recording situations in the hope that this more specifically tailored approach will provide computationally cheaper, more flexible, robust solutions, and that a probabilistic approach will allow noise to be addressed in a principled manner.
Specifically, we approach the issue of noisy observations and interpolation of undersampled data first in a model-based, and then in a model-free setting. We start by exploring how an accurate description of a cell can be used for optimal de-noising and to infer unobserved variables, such as Ca 2+ concentration from voltage. We then proceed to show how an accurate model of a cell can be inferred from the noisy signals in the first place; this relies on using model-based smoothing as the first step of a standard, two-step, iterative machine learning algorithm known as Expectation-Maximisation [24,25]. The ''Maximisation'' step here turns out to be a weighted version of our previous regression-based inference method, which assumed exact knowledge of the biophysical signals.

Overview
The aim of this paper is to fit biophysically detailed models to noisy electrophysiological or imaging data. We first give an overview of the kinds of models we consider; which parameters in those models we seek to infer; how this inference is affected by the noise inherent in the measurements; and how standard machine learning techniques can be applied to this inference problem. The overview will be couched in terms of voltage measurements, but we later also consider measurements of calcium concentrations.
Compartmental models. Compartmental models are spatially discrete approximations to the cable equation [13,26,27] and allow the temporal evolution of a compartment's voltage to be written as where V x t ð Þ is the voltage in compartment x, C m is the specific membrane capacitance, and N x t ð Þ is current evolution noise (here assumed to be white and Gaussian). Note the important factor ffiffiffiffi dt p which ensures that the noise variance grows linearly with time dt. The currents a x,i J x,i t ð Þ we will consider here are of three types: N Axial currents along dendrites N Transmembrane currents from active (voltage-dependent), passive, or other (e.g. Ca 2+ -dependent) membrane conductances where c indicates one particular current type (''channel''), E c its reversal potential and g x,c its maximal conductance in compartment x, R m is the membrane resistivity and I x t ð Þ is the current experimentally injected into that compartment. The variable 0ƒo x,c t ð Þƒ1 represents the time-varying open fraction of the conductance, and is typically given by complex, highly nonlinear functions of time and voltage. For example, for the Hodgkin and Huxley (HH) K + -channel, the kinetics are given by o c t ð Þ~n 4 t ð Þ, with and a n V ð Þ,b n V ð Þ themselves nonlinear functions of the voltage [28] and we again have an additive noise term. In practice, the gate noise is either drawn from a truncated Gaussian, or one can work with the transformed variablẽ n n t ð Þ~tanh {1 2n{1 ð Þ. Similar equations can be formulated for other variables such as the intracellular free Ca 2+ concentration [27].
Noiseless observations. A detailed discussion of the case when the voltage is observed approximately noiselessly (such as with a patch-clamp electrode) is presented in [23] (see also [18,21,22]). We here give a short review over the material on which the present work will build. Let us henceforth assume that all the kinetics (such as a n V ð Þ) of all conductances are known. Once the voltage is known, the kinetic equations can be evaluated to yield the open fraction o c t ð Þ of each conductance c of interest. We further assume knowledge of the reversal potentials E c , although this can be relaxed, and of the membrane specific capacitance C m (which is henceforth neglected for notational clarity and fixed at 1 nF/cm 2 ; see [29] for a discussion of this assumption).
Knowledge of channel kinetics and voltage in each of the cell's compartments allows inference of the linear parameters f xy ,g c,x ,R m and of the noise terms by constrained linear regression [23]. As an example, consider a single-compartment cell containing one active (Hodgkin-Huxley K + ) and one leak conductance and assume the voltage V t has been recorded at sampling intervals dt for a time period of T total . Let T~T total =dt be the number of data points and t index them successively t~1, Á Á Á ,T f g : where we see that only g K , g L and R m are now unknown; that they mediate the linear relationship between dV t and J t ; and that these parameters can be concatenated into a vector a as illustrated in equation 6. The maximum likelihood (ML) estimate of a (in vectorized form) and of s 2 are given by

Author Summary
Cellular imaging techniques are maturing at a great pace, but are still plagued by high levels of noise. Here, we present two methods for smoothing individual, noisy traces. The first method fits a full, biophysically accurate description of the cell under study to the noisy data. This allows both smoothing of the data and inference of biophysically relevant parameters such as the density of (active) channels, input resistance, intercompartmental conductances, and noise levels; it does, however, depend on knowledge of active channel kinetics. The second method achieves smoothing of noisy traces by fitting arbitrary kinetics in a non-parametric manner. Both techniques can additionally be used to infer unobserved variables, for instance voltage from calcium concentration. This paper gives a detailed account of the methods and should allow for straightforward modification and inclusion of additional measurements.
where x k k 2~P i x 2 i . Note that the last equality in equation 7 expresses the solution of the model fitting problem as a quadratic minimization with linear constraints on the parameters and is straightforwardly performed with standard packages such as quadprog.m in Matlab. The quadratic log-likelihood in equation 7 and therefore the linear form of the regression depends on the assumption that the evolution noise N I,t of the observed variable in equation 6 is Gaussian white noise. Parameters that can be simultaneously inferred in this manner from the true voltage trace are g, f , R m , time-varying synaptic input strengths and the evolution noise variances [23].
In the following, we will write all the dynamical equations as simultaneous equations where S ii is the evolution noise variance of the i th dynamic variable, S ij~0 if i=j and N t denotes a vector of independent, identically distributed (iid) random variables. These are Gaussian for unconstrained variables such as the voltage, and drawn from truncated Gaussians for constrained variables such as the gates.
For the voltage we have y V h t ð Þ~J t a=dt and we remind ourselves that J t is a function of h t (equation 6).
Observation noise. Most recording techniques yield estimates of the underlying variable of interest that are much more noisy than the essentially noise-free estimates patchclamping can provide. Imaging techniques, for example, do not provide access to the true voltage which is necessary for the inference in equation 7. Figure 1 describes the hidden dynamical system setting that applies to this situation. Crucially, measurements y t are instantaneously related to the underlying voltage V t by a probabilistic relationship (the turquoise arrows in Figure 1) which is dependent on the recording configuration. Together, the model of the observations, combined with the (Markovian) model of the dynamics given by the compartmental model define the following hidden dynamical system: where N x m,v ð Þ denotes a Gaussian or truncated Gaussian distribution over x with mean m and variance v and Ph t denotes the linear measurement process (in the following simply a linear projection such that Ph t~Vt or Ph t~C a 2z Â Ã t ). We assume Gaussian noise both for the observations and the voltage; and truncated Gaussian noise for the gates. The Gaussian assumption on the evolution noise for the observed variable allows us to use a simple regression (equation 7) in the inference of the channel densities. Note that although the noise processes are i.i.d., the fact that noise is injected into all gates means that the effective noise in the observations can show strong serial correlations.
Importantly, we do not assume that h bas the same dimensionality as y; in a typical cellular setting, there are several unobserved variables per compartment, only one or a few of them being measured. For Figure 2, which illustrates the particle filter for a single-compartment model with leak, Na + and K + Hodgkin-Huxley conductances, only Ph t~Vt is measured, although the hidden variable h t~Vt ,h t ,m t ,n t f gis 4-dimensional and includes the three gates for the Na + and K + channels in the classical Hodgkin-Huxley model. It is, however, possible to have y of dimensionality equal to (or even greater than) h. For example, [5] simultaneously image voltage-and [Ca 2+ ]-sensitive dyes.

Expectation-Maximisation
Expectation-Maximisation (EM) is one standard machinelearning technique that allows estimation of parameters in precisely the circumstances just outlined, i.e. where inference depends on unobserved variables and certain expectations can be evaluated. The EM algorithm achieves a local maximisation of the data likelihood by iterating over two steps. For the case where voltage is recorded, it consists of: 1. Expectation step (E-Step): The parameters are fixed at their current estimate h f~ĥ h k ; based on this (initally inaccurate) parameter setting, the conditional distribution of the hidden variables p h t jy 1:T ; h f À Á (where y 1:T~yt f g T t~1 are all the observations) is inferred. This effectively amounts to modelbased smoothing of the noisy data and will be discussed in the first part of the paper.  The EM algorithm can be shown to increase the likelihood of the parameters at each iteration [24,25,30,31], and will typically converge to a local maximum. Although in combination with the Monte-Carlo estimation these guarantees no longer hold, in practice, we have never encountered nonglobal optima.

Model-based smoothing
We first assume that the true parameters h are known, and in the E-step infer the conditional marginal distributions p h t jy 1:T ,h ð Þ for all times t. The conditional mean Sh t T~Ð dh t h t p h t jy 1:T ,h ð Þis a model-based, smoothed estimate of the true underlying signal h t at each point in time t which is optimal under mean squared error. The E-step is implemented using standard sequential Monte Carlo techniques [7]. Here we present the detailed equations as applied to noisy recordings of cellular dynamic variables such as the transmembrane voltage or intracellular calcium concentration.
The smoothed distribution p h t jy 1:T ,h ð Þ is computed via a backward recursion which relies on the filtering distribution p h t jy 1:t ,h ð Þ , which in turn is inferred by writing the following recursion (suppressing the dependence on h for clarity): This recursion relies on the fact that the hidden variables are Markovian Based on this, the smoothed distribution, which gives estimates of the hidden variables that incorporate all, not just the past, observations, can then be inferred by starting with p h T jy 1:T ð Þand iterating backwards: where all quantities inside the integral are now known.

Sequential Monte Carlo
The filtering and smoothing equations demand integrals over the hidden variables. In the present case, these integrals are not analytically tractable, because of the complex nonlinearities in the kinetics y : ð Þ. They can, however, be approximated using Sequential Monte Carlo methods. Such methods (also known as ''particle filters'') are a special version of importance sampling, in which distributions and expectations are represented by weighted . If samples are drawn from the distribution p x ð Þ directly, the weights w~w i ð Þ~1 =N Vi. In the present case, this would mean drawing samples from the distributions p h t jy 1:t ð Þ and p h t jy 1:T ð Þ, which is not possible because they themselves depend on integrals at adjacent timesteps which are hard to evaluate exactly. Instead, importance sampling allows sampling from a different ''proposal'' distribution x i ð Þ *q x ð Þ and compensating by setting Here, we first seek samples and forward filtering weights w and based on these will then derive backwards, smoothing weights such that Substituting the desideratum in equation 15 for time t{1 into equation 12 As a proposal distribution for our setting we use the one-step predictive probability distribution (derived from the Markov property in equation 13): where h i ð Þ 1:T is termed the i th ''particle''. The samples are made to reflect the conditional distribution by adjusting the weights, for which the probabilities p h i ð Þ t y 1:t need to be computed. These are given by that is quadratic in N. We approximate this by which neglects the probability that the particle i at time t could in fact have arisen from particle j at time t{1. The weights for each of the particles are then given by a simple update equation: One well-known consequence of the approximation in equations 19-21 is that over time, the variance of the weights becomes large; this means that most particles have negligible weight, and only one particle is used to represent a whole distribution. Classically, this problem is prevented by resampling, and we here use stratified resampling [8]. This procedure, illustrated in Figure 2, results in eliminating particles that assign little, and duplicating particles that assign large likelihood to the data whenever the effective number of particles N eff drops below some threshold, here N eff~N =2.
It should be pointed out that it is also possible to interpolate between observations, or to do learning (see below) from subsampled traces. For example, assume we have a recording frequency of 1=D s but wish to infer the underlying signal at a higher frequency 1=D, with D s ƒD. At time points without observation the likelihood term in equation 21 is uninformative (flat) and we therefore set keeping equation 21 for the remainder of times. In this paper, we will run compartmental models (equation 1) at sampling intervals D, and recover signals to that same temporal precision from data subsampled at intervals D s §D. See e.g. [32] for further details on incorporating intermittently-sampled observations into the alter- We have so far derived the filtering weights such that particles are representative of the distribution conditioned on the past data p h t jy 1:t ,h ð Þ . It often is more appropriate to condition on the entire set of measurements, i.e. represent the distribution p h t jy 1:T ,h ð Þ . We will see that this is also necessary for the parameter inference in the M-step. Substituting equations 15 and 16 into equation 14, we arrive at the updates for the smoothing weights where the weights w ij ð Þ s,tz1,t now represent the joint distribution of the hidden variables at adjacent timesteps:

Parameter inference
The maximum likelihood estimate of the parameters can be inferred via a maximisation of an expectation over the hidden variables: where h 1:T~ht f g T t~1 . This is achieved by iterating over the two steps of the EM algorithm. In the M-step of the k th iteration, the likelihood of the entire set of measurements y 1:T with respect to the parameters h is maximised by maximising the expected total log likelihood [25] h h kz1~a rgmax h Slog p y 1: which is achieved by setting the gradients with respect to h to zero (see [31,33] for alternative approaches). For the main linear parameters we seek to infer in the compartmental model (a~f xy ,g,R m È É ), these equations are solved by performing a constrained linear regression, akin to that in equation 7. We write the total likelihood in terms of the dynamic and the observation models (equations 10 and 11): Let us assume that we have noisy measurements of the voltage. Because the parametrisation of the evolution of the voltage is linear, but that of the other hidden variables is not, we separate the two as h~V ,h h Â Ã whereh h are the gates of the conductances affecting the voltage (a similar formulation can be written for [Ca 2+ ] observations). Approximating the expectations by the weighted sums of the particles defined in the previous section, we arrive at Slog p y 1:T ,h 1:T jh ð Þ T p h1:T jy 1: where x k k 2 C~1 2 x T C {1 x, m 1 and S 1 parametrise the distribution p h 1 ð Þ over the initial hidden variables at time t~1, and J j ð Þ t is the t th row of the matrix J j ð Þ derived from particle j. Note that because we are not inferring the kinetics of the channels, the evolution term for the gates (a sum over terms of the form where we see that the Hessian H and the linear term f of the problem are given by an expectation involving the particles. Importantly, this is still a quadratic optimisation problem with linear constraints, and which is efficiently solved by standard packages. Similarly, the initialisation parameters for the unobserved hidden variables are given bŷ which are just the conditional mean and variance of the particles at time t~1; and the evolution and observation noise terms finally byŝ Thus, the procedure iterates over running the particle smoother in section Sequential Monte Carlo and then inferring the optimal parameters from the smoothed estimates of the unobserved variables.

Model-based smoothing
We first present results on model-based smoothing. Here, we assume that we have a correct description of the parameters of the cell under scrutiny, and use this description to infer the true underlying signal from noisy measurements. These results may be considered as one possible application of a detailed model. Figure 2A shows the data, which was generated from a known, single-compartment cell with Hodgkin-Huxley-like conductances by adding Gaussian noise. The variance of the noise was chosen to replicate typical signal-to-noise ratios from voltage-dye experiments [2]. Figure 2B shows the N~30 particles used here, and Figure 2C the number of particles with non-negligible weights (the ''effective'' number N eff of particles). We see that when N eff hits a threshold of N=2, resampling results in large jumps in some particles. At around 3 ms, we see that some particles, which produced a spike at a time when there is little evidence for it in the data, are re-set to a value that is in better accord with the data. Figure 2D shows the close match between the true underlying signal and the inferred meanV Figure 2E shows that even the unobserved channel open fractions are inferred very accurately. The match for both the voltage and the open channel fractions improves with the number of particles. Code for the implementation of this smoothing step is available online at http://www.gatsby.ucl.ac.uk/,qhuys/code.html.
For imaging data, the laser often has to be moved between recording locations, leading to intermittent sampling at any one location (see [34][35][36]). Figure 3 illustrates the performance of the model-based smoother both for varying noise levels and for temporal subsampling. We see that even for very noisy and highly subsampled data, the spikes can be recovered very well. Figure 4 shows a different aspect of the same issue, whereby the laser moves linearly across an extended linear dendrite. Here, samples are taken every D s timesteps, but samples from each individual compartment are only obtained each N comp D s . The true voltage across the entire passive dendrite is shown in Figure 4A, and the sparse data points distributed over the dendrite are shown in panel B. The inferred mean in panel C matches the true voltage very well. For this passive, linear example, the equations for the hidden dynamical system are exactly those of a Kalman smoother model [37]; thus the standard Kalman smoother performs the correct spatial and temporal smoothing once the parameters are known, with no need for the more general (but more computationally costly) particle smoother introduced above. More precisely, in this case the integrals in equations 12 and 14 can be evaluated analytically, and no sampling is necessary.
The supplemental video S1 shows the results of a similar linear (passive-membrane) simulation, performed on a branched simulated dendrite (instead of the linear dendritic segment illustrated in Figure 4).
We emphasize that the strong performance of the particle smoother and the Kalman smoother here should not be surprising, since the data were generated from a known model and in these cases these methods perform smoothing in a statistically optimal manner. Rather, these results should illustrate the power of using an exact, correct description of the cell and its dynamics.

EM -inferring cellular parameters
We have so far shown model-based filtering assuming that a full model of the cell under scrutiny is available. Here, we instead infer some of the main parameters from the data; specifically the linear parameters f ,g c ,R m , the observation noise s O and the evolution noise s. We continue to assume, however, that the kinetics of all channels that may be present in the cell are known exactly (see [23] for a discussion of this assumption). Figure 5 illustrates the inference for a passive multicompartmental model, similar to that in Figure 4, but driven by a square current injection into the second compartment. Figure 5B shows statistics of the inference of the leak conductance maximal density g L , the intercompartmental conductance f , the input resistance R m and the observation noise s O across 50 different randomly generated noisy voltage traces. All the parameters are reliably recovered from 2 seconds of data at a 1 ms sampling frequency. We now proceed to infer channel densities and observation noise from active compartments with either four or eight channels. Figure 6 shows an example trace and inference for the four channel case (using Hodgkin-Huxley like channel kinetics). Again, we stimulated with square current pulses. Only 10 ms of data were recorded, but at a very high temporal resolution D s = D = 0.02 ms. We see that both the underlying voltage trace and the channel and input resistance are recovered with high accuracy. Figure 7 presents batch data over 50 runs for varying levels of observation noise s O . The observation noise here has two effects: first, it slows down the inference (as every data point is less informative), but secondly the variance across runs increases with increasing noise (although the mean is still accurate). For illustration purposes, we started the maximal K + conductance at its correct value. As can be seen, however, the inference initially moves g K away from the optimum, to compensate for the other conductance misestimations. (This nonmonotonic behavior in g K is a result of the fact that the EM algorithm is searching for an optimal setting of all of the cell's conductance parameters, not just a single parameter; we will return to this issue below.) Parametric inference here has so far employed densely sampled traces (see Figure 6A). The algorithm however applies equally to subsampled traces (see equation 22). Figure 8 shows the effect of subsampling. We see that subsampling, just as noise, slows down the inference, until the active conductances are no longer inferred accurately (the yellow trace for D s = 0.5 ms). In this case, the total recording length of 10 ms meant that inference had to be done based on one single spike. For longer recordings, information about multiple spikes can of course be combined, partially alleviating this problem; however, we have found that in highly active membranes, sampling frequencies below about 1 KHz led to inaccurate estimates of sodium channel densities (since at slower sampling rates we will typically miss significant portions of the upswing of the action potential, leading the EM algorithm to underestimate the sodium channel density). Note that we kept the length of the recording in Figure 8 constant, and thus subsampling reduced the total number of measurements.
As with any importance sampling method, particle filtering is known to suffer in higher dimensions [38]. To investigate the dependence of the particle smoother's accuracy on the dimensionality of the state space, we applied the method to a compartment with a larger number of channels: fast (Na f ) and persistent Na + (Na P ) channels in addition to leak (L) and delayed rectivier (K DR ), A-type (K A ), K2-type (K 2 ) and M-type (K M ) K + channels (channel kinetics from ModelDB [39], from [9,40]). Figure 9 shows the evolution of the channel intensities during inference. Estimates of most channel densities are correct up to a factor of approximately 2. Unlike in the previous, smaller example, as either observation noise or subsampling increase, significant biases in the estimation of channel densities appear. For instance, the density of the fast sodium channel observed with noise of standard deviation 20 mV is only about half the true value. It is worth noting that this bias problem is not observed in the passive linear case, where the analytic Kalman smoother suffices to perform the inference: we can infer the linear dynamical parameters of neurons with many compartments, as long as we sample information from each compartment [23]. Instead, the difficulty here is due to multicollinearity of the regression performed in the M-step of the EM algorithm and to the fact that the particle smoother leads to biased estimation of covariance parameters in high-dimensional cases [38]. We will discuss some possible remedies for these biases below.
Somewhat surprisingly, however, these observed estimation biases do not prove catastrophic if we care about predicting or smoothing the subthreshold voltage. Figure 10A compares the response to a new, random, input current of a compartment with the true parameters to that of a compartment with parameters as estimated during EM inference, while Figure 10B shows an example prediction with S V{V est j j T&3 mV. Note the large plateau potentials after the spikes due to the persistent sodium channel Na P . We see that even the parameters as estimated under high noise accurately come to predict the response to a new, previously unseen, input current. The asymptote in Figure 10A is determined by the true evolution noise level (here s = 1 mV): the more inherent noise, the less a response to a specific input is actually predictable.
Some further insight into the problem can be gained by looking at the structure of the Hessian of the total likelihood H around the true parameters. We estimate H by running the particle smoother with a large number of particles once at the true parameter value; more generally, one could perform a similar analysis about the inferred parameter setting to obtain a parametric bootstrap estimate of the posterior uncertainty. Figure 11 shows that, around the true value, changes in either the fast Na + or the Note that accurate estimation of the leak, input resistance and noise levels is even possible when the noise is five times as large as that shown in panel A. Inference of the intercompartmental conductance suffers most from the added noise because the small intercompartmental currents have to be distinguished from the apparent currents arising from noise fluctuations in the observations from neighbouring compartments. Throughout, the underlying voltage was estimated highly accurately (data not shown), which is also reflected in the accurate estimates of s O . doi:10.1371/journal.pcbi.1000379.g005 K 2 -type K + channel have the least effect; i.e., the curvature in the loglikelihood is smallest in these directions, indicating that the observed data does not adequately constrain our parameter estimates in these directions, and prior information must be used to constrain these estimates instead. This explains why these channels showed disproportionately large amounts of inference variability, and why the prediction error did not suffer catastrophically from their relatively inaccurate inference ( Figure 10A). See [23] for further discussion of this multicollinearity issue in large multichannel models.

Estimation of subthreshold nonlinearity by nonparametric EM
We saw in the last section that as the dimensionality of the state vector h t grows, we may lose the ability to simultaneously estimate all of the system parameters. How can we deal with this issue? One approach is to take a step back: in many statistical settings we do not care primarily about estimating the underlying model parameters accurately, but rather we just need a model that predicts the data well. It is worth emphasizing that the methods we have intrduced here can be quite useful in this setting as well. As an important example, consider the problem of estimating the subthreshold voltage given noisy observations. In many applications, we are more interested in a method which will reliably extract the subthreshold voltage than in the parameters underlying the method. For example, if a linear smoother (e.g., the Kalman smoother discussed above) works well, it might be more efficient and stable to stick with this simpler method, rather than attempting to estimate the parameters defining the cell's full complement of active membrane channels (indeed, depending on the signal-to-noise ratio and the collinearity structure of the problem, the latter goal may not be tractable, even in cases where the voltage may be reliably measured [23]).
Of course, in many cases linear smoothers are not appropriate. For example, the linear (Kalman) model typically leads to oversmoothing if the voltage dynamics are sufficiently nonlinear (data not shown), because action potentials take place on a much faster timescale than the passive membrane time constant. Thus it is worth looking for a method which can incorporate a flexible nonlinearity and whose parameters may not be directly interpretable biophysically but which nonetheless leads to good estimation of the signal of interest. We could just throw a lot of channels into the mix, but this increases the dimensionality of the state space, hurting the performance of the particle smoother and leading to multicollinearity problems in the M-step, as illustrated in the last subsection.
A more promising approach is to fit nonlinear dynamics directly, while keeping the dimensionality of the state space as small as possible. This has been a major theme in computational neuroscience, where the reduction of complicated multichannel models into low-dimensional models, useful for phase plane analysis, has led to great insights into qualitative neural dynamics [26,41].
As a concrete example, we generated data from a strongly nonlinear (Fitzhugh-Nagumo) two-dimensional model, and then attempted to perform optimal smoothing, without prior knowledge of the underlying voltage nonlinearity. We initialized our analysis with a linear model, and then fit the nonlinearity nonparametrically via a straightforward nonparametric modification of the EM approach developed above.
In more detail, we generated data from the following model [41]: where the nonlinear function f V ð Þ is cubic in this case, and N u t ð Þ and N V t ð Þ denote independent white Gaussian noise processes. 5} ms respectively. All particles were run with a D = 0.01 ms timestep, and the total recording was always 10 ms long, meaning that progressive subsampling decreased the total number of data points. Thus, it can be seen that parameter inference is quite relatively to undersampling. At very large subsampling times, 10 ms of data supplied too few observations during a spike to justify inference of high levels of Na + and K + conductances, but the input resistance and the leak were still reliably and accurately inferred. doi:10.1371/journal.pcbi.1000379.g008 derivative has a large squared norm); the scalar l serves to set the balance between the smoothness of f V ð Þ and the fit to the data. Despite its apparent complexity, in fact this expression is just a quadratic function of h (just like equation (24)), and the updateĥ h may be obtained by solving a simple linear equation. If the basis functions f k V ð Þ have limited overlap, then the Hessian of this objective function with respect to h is banded (with bandwidth equal to the degree of overlap in the basis functions f k V ð Þ), and therefore this linear equation can be solved quickly using sparse banded matrix solvers [42,43]. We used 50 nonoverlapping simple step functions to represent f V ð Þ in Figures. 12-13, and each M-step took negligible time (%1 sec). The penalty term l was fit crudely by eye here (we chose a l that led to a reasonable fit to the data, without drastically oversmoothing f V ð Þ); this could be done more systematically by model selection criteria such as maximum marginal likelihood or cross-validation, but the results were relatively insensitive to the precise choice of l. Finally, it is worth noting that the EM algorithm for maximum penalized likelihood estimation is guaranteed to (locally) optimize the penalized likelihood, just as the standard EM algorithm (locally) optimizes the unpenalized likelihood.
Results are shown in Figures 12 and 13. In Figure 12, we observe a noisy version of the voltage V t , iterate the nonparametric penalized  EM algorithm ten times to estimate f V ð Þ, then compute the inferred voltage E V t jY ð Þ. In Figure 13, instead of observing the noisecontaminated voltage directly, we observe the internal calcium concentration. This calcium concentration variable C t followed its own noisy dynamics, where N C t ð Þ denotes white Gaussian noise, and the r V t ð Þ term represents a fast voltage-activated inward calcium current which activates at 220 mV (i.e., this current is negligible at rest; it is effectively only activated during spiking). We then observed a noisy fluorescence signal F t which was linearly related to the calcium concentration C t [32]. Since the informative signal in F t is not its absolute magnitude but rather how quickly it is currently changing  Here the voltage-dependent calcium current had an activation potential at 220 mV (i.e., the calcium current is effectively zero at voltages significantly below 220 mV; at voltages .10 mV the current is ohmic). The superthreshold voltage behavior is captured fairly well, as are the post-spike hyperpolarized dynamics, but the details of the resting subthreshold behavior are lost. doi:10.1371/journal.pcbi.1000379.g013 (dC t =dt is dominated by r V t ð Þ during an action potential), we plot the time derivative dF t =dt in Figure 13; note that the effective signalto-noise in both Figures 12 and 13 is quite low.
The nonparametric EM-smoothing method effectively estimates the subthreshold voltage V t in each case, despite the low observation SNR. In Figure 12, our estimate of f V ð Þ is biased towards a constant by our smoothing prior; this low-SNR data is not informative enough to overcome the effect of the smoothing penalty term here; indeed, since this oversmoothed estimate of f V ð Þ is sufficient to explain the data well, as seen in the left panels of Figure 12, the smoother estimate is preferred by the optimizer. With more data, or a higher SNR, the estimated f V ð Þ becomes more accurate (data not shown). It is also worth noting that if we attempt to estimate V t from dF t =dt using a linear smoother in Figure 13, we completely miss the hyperpolarization following each action potential; this further illustrates the advantages of the model-based approach in the context of these highly nonlinear dynamical observations.

Discussion
This paper applied standard machine learning techniques to the problem of inferring biophysically detailed models of single neurones automatically and directly from single-trial imaging data. In the first part, the paper presented techniques for the use of detailed models to filter noisy and temporally and spatially subsampled data in a principled way. The second part of the paper used this approach to infer unknown parameters by EM.
Our approach is somewhat different from standard approaches in the cellular computational neuroscience literature ( [12,14,15,19], although see [18]), in that we argue that the inference problem posed is equivalent to problems in many other statistical situations. We thus postulate a full probabilistic model of the observations and then use standard machine learning tools to do inference about biophysically relevant parameters. This is an approach that is more standard in other, closely related fields in neuroscience [44,45]. Importantly, we attempt to use the description of the problem in detail to arrive at as efficient as possible a method of using the data. This implies that we directly compare recording traces (the voltage or calcium trace), rather than attempting to fit measures of the traces such as the ISI distribution, and the sufficient statistics that are used for the parameter inference involves aspects of the data these parameters influence directly. One alternative is to include a combination of such physiologically relevant objective functions and to apply more general fitting routines [46,47]. A key assumption in our approach is that accurately fitting the voltage trace will lead to accurate fits of such other measures derived from the voltage trace, such as the inter-spike interval distribution. In the present approach this means that variability is explicitly captured by parameters internal to the model. In our experience, this is important to avoid both overfitting individual traces and neglecting the inherently stochastic nature of neural responses.
A number of possible alternatives to sequential Monte Carlo methods exist, such as variations of Kalman filtering like extended or unscented Kalman filters [48,49], variational approaches (see [50]) and approximate innovation methods [45,51,52]. We here opted for a sequential Monte Carlo method because it has the advantage of allowing the approximation of arbitrary distributions and expectations. This is of particular importance in the problem at hand because a) we specifically wish to capture the nonlinearities in the problem as well as possible and b) the distributions over the unobserved states are highly non-Gaussian, due to both the nonlinearities but also due to unit bounds on the gates.
Model-based smoothing thus provides a well-founded alternative to standard smoothing techniques, and, importantly, allows smoothing of data without any averaging over either multiple cells or multiple trials [53]. This allows the inference of unobserved variables that have an effect on the observed variable. For example, just as one can infer the channels' open fractions, one can estimate the voltage from pure [Ca 2+ ] recordings (data not shown). The formulation presented makes it also straightforward to combine measurements from various variables, say [Ca 2+ ] and transmembrane voltage, simply by appropriately defining the observation density p y t jh t ,h ð Þ. We should emphasize, though, that the techniques themselves are not novel. Rather, this paper aims to point out to what extent these techniques are promising for cellular imaging.
The demand, when smoothing, for an accurate knowledge of the cell's parameters is addressed in the learning part of the paper where some of the important parameters are inferred accurately from small amounts of data. One instructive finding is that adding noise to the observations did not hurt our inference on average, though it did make it slower and more variable (note the wider error bars in Figure 7). In the higher-dimensional cases, we found that the dimensions in parameter space which have least effect on the models' behavior were also least well inferred. This may replicate the reports of significant flat (although not disconnected) regions in parameter space revealed in extensive parametric fits using other methods [19]. A number of parameters also remain beyond the reach of the methods discussed here, notably the kinetic channel parameters; this is the objective of the nonparametric inference in the last section of the Results, and also of further ongoing work.
A number of additional questions remain open. Perhaps the fundamental direction for future research involves the analysis of models in which the nonlinear hidden variable h t is highdimensional. As we saw in section EM -inferrring cellular parameters, our basic particle smoothing-EM methodology can break down in this high-dimensional setting. The statistical literature suggests two standard options here. First, we could replace the particle smoothing method with more general (but more computationally expensive) Markov chain Monte Carlo (MCMC) methods [54] for computing the necessary sufficient statistics for inference in our model. Designing efficient MCMC techniques suitable for high-dimensional multicompartmental neural models remains a completely open research topic. Second, to combat the multicollinearity diagnosed in Figure 11 (see also Figure 6 of [23]), we could replace the maximumlikelihood estimates considered here with maximum a posteriori (maximum penalized likelihood) estimates, by incorporating terms in our objective function (7) to penalize parameter settings which are believed to be unlikely a priori. As discussed in section Estimation of subthreshold nonlinearity by nonparametric EM, the EM algorithm for maximum penalized likelihood estimation follows exactly the same structure as the standard EM algorithm for maximum likelihood estimation, and therefore our methodology may easily be adapted for this case. Finally, a third option is to proceed along the direction indicated in section Estimation of subthreshold nonlinearity by nonparametric EM: instead of attempting to fit the parameters of our model perfectly, in many cases we can develop good voltage smoothers using a cruder, approximate model whose parameters may be estimated much more tractably. We expect that a combination of these three strategies will prove to be crucial as optimal filtering of nonlinear voltage-and calcium-sensitive dendritic imaging data becomes more prevalent as a basic tool in systems neuroscience.

Supporting Information
Video S1 Kalman smoother video. The video shows the inference of the underlying voltage in a passive cell from intermittent recordings along the dendrites. The left panel shows the true voltage; the middle panel the measurements (black means no measurement from that dendritic location at that time, cf. Figure 4); and the right panel the reconstructed voltage in the entire cell. Found at: doi:10.1371/journal.pcbi.1000379.s001 (1.63 MB MOV)