Inferring oscillatory modulation in neural spike trains

Oscillations are observed at various frequency bands in continuous-valued neural recordings like the electroencephalogram (EEG) and local field potential (LFP) in bulk brain matter, and analysis of spike-field coherence reveals that spiking of single neurons often occurs at certain phases of the global oscillation. Oscillatory modulation has been examined in relation to continuous-valued oscillatory signals, and independently from the spike train alone, but behavior or stimulus triggered firing-rate modulation, spiking sparseness, presence of slow modulation not locked to stimuli and irregular oscillations with large variability in oscillatory periods, present challenges to searching for temporal structures present in the spike train. In order to study oscillatory modulation in real data collected under a variety of experimental conditions, we describe a flexible point-process framework we call the Latent Oscillatory Spike Train (LOST) model to decompose the instantaneous firing rate in biologically and behaviorally relevant factors: spiking refractoriness, event-locked firing rate non-stationarity, and trial-to-trial variability accounted for by baseline offset and a stochastic oscillatory modulation. We also extend the LOST model to accommodate changes in the modulatory structure over the duration of the experiment, and thereby discover trial-to-trial variability in the spike-field coherence of a rat primary motor cortical neuron to the LFP theta rhythm. Because LOST incorporates a latent stochastic auto-regressive term, LOST is able to detect oscillations when the firing rate is low, the modulation is weak, and when the modulating oscillation has a broad spectral peak.


Introduction
Neural oscillations have generated considerable interest for their roles in cognition and as indicators for disease [1][2][3][4][5][6][7][8][9][10][11][12][13][14]. Electroencephalograms (EEGs) and local field potential (LFPs) recordings have revealed transient oscillations in many cortical and subcortical structures, to which neurons both near and far from the LFP recording site show phase preferences in spiking. Groups of neurons that are locked to a common oscillation, and are therefore active in tightly confined temporal windows, may define a cell assembly whose synchronous spiking could select and activate afferent structures [15][16][17]. Such transient cell assemblies may form and disband as objects are attended to in the visual scene [3,[6][7][8]11], as preparation for movements are made [2,13], or during choice points for reward [9]. Prominent slow oscillations are observed as patients undergo anesthesia [10], and exhibit changes in dynamics during transitions in unconscious brain state [14]. Furthermore, strong oscillatory signals are prominent in neurological disorders such as Parkinson's disease, and can be used to characterize pathophysiology and its relation to behavior [12,18,19]. The oscillatory structure of individual neurons can thus provide insight into the dynamics of cognitive processing and motor planning, and their implementation in health and disease. However, identifying oscillations in neural spike trains, particularly within individual experimental trials, can be challenging because they occur in conjunction with both non-oscillatory fluctuations in neural firing rate and extraneous noise. To tease apart multiple factors that affect spiking activity, statisticians have suggested the use of point process models together with modern regression methods associated with generalized linear model (GLM) technology [20,21]. In this paper we develop a method that can find oscillations in spike trains even when the signal is comparatively weak, and can track changes in oscillatory behavior across trials.
The starting point for our approach is the observation by Smith and Brown [22] that, for many purposes, evolving neural firing rates can be described using state-space models imbedded in point processes. We modify the models used by Smith and Brown, replacing their firstorder autoregressive processes with higher-order processes [23] that can fit oscillations found in spiking neurons. This requires careful attention to the form of the higher-order autoregressive process. We take advantage of a Bayesian time-series decomposition introduced by Huerta and West [24], using prior probability distributions to constrain the fit so that it has appropriate power spectral content. We also use a Gibbs sampling method developed recently in a different context [25] to compute posterior distributions efficiently.
Our approach is superficially related to that of Allcroft et al. [26], who used autoregressive moving average models to estimate spectral content from censored data, but their method does not accommodate easily the special situation we face with spiking neurons, including history effects that can account for hard and soft refractory periods. Point process models employing the GLM framework [21,27] have been employed to explain the spiking behavior by taking into account spiking refractoriness, behavioral and stimulus-induced non-stationarities, and spiking of neighboring neurons [20]. A series of investigations [12,18,19] have added long-term history effects to capture oscillation in the history dependence and have also used a state-space smoothing algorithm to track changes in the oscillatory dynamics over time. Eden et al used a long history to capture both, but as we shall see, LOST can better capture the irregularities present in realistic biological oscillations. Our approach is based on what we call the Latent Oscillatory Spike Train (LOST) model, which not only includes terms to describe the spiking refractoriness, event (behavior or stimulus)-locked effects, and trial-to-trial variability in firing rate, but also explicitly models the dynamics of the modulating oscillation itself. However, the latent state cannot capture the discontinuous change in the spiking probability that occurs after every spike, which the aforementioned GLM models do well in characterizing, so we utilize a short-term spiking history to capture the refractory period, while also introducing the latent state to model the oscillatory dynamics.
Oscillatory structure is often considerably degraded when it is observed as a spike train. On the one hand, if the firing rate is low, or the oscillatory modulation is weak, noise associated with erratic spiking will dominate. On the other hand, even if the modulation is strong and the firing rate is high, the oscillatory signal may itself be unsteady, with substantial variation in period from cycle to cycle. LOST accommodates both spiking noise and oscillatory variation by imbedding into the point process intensity function a latent auto-regressive process, and then imposing a suitable soft constraint (in the form of a prior probability distribution) on the oscillatory dynamics. We provide details of the model, along with the posterior sampling scheme and a discussion on how to interpret the posterior samples, and study its effectiveness with leaky integrate-and-fire neural simulations. We then demonstrate the ability of the LOST framework to find interesting trial-to-trial variation in motor cortical neuron during a leverpulling task.

Model
We seek to extract the underlying oscillation from the binary (M × N) matrix y representing the binned spiking data from a single neuron during M repetitions of identical behavior or stimulus presentations of time duration N. Because the spiking times are different for each trial, even though the behavior or stimulus are identical, we construct a probabilistic model of the spiking.
A point process models the probability of observing a spike at time t. The conditional intensity function (CIF) completely specifies the point process, and is defined as where P[Á|Á] is a conditional probability conditioned on the past spiking history H(t) and model parameters Θ [21], making the point process CIF a spike history-dependent generalization of a general inhomogeneous Poisson process. We discretize the problem, and note that λ [t|H(t)]Δt is the approximate binary probability of observing a single spike in a small time interval Δt. In the discrete time representation, the likelihood of the spike train is then expressed as a product of conditionally independent Bernoulli events The binary spiking probability is expressed as a sigmoidal function of a sum of several predictors, including a history term λ R , modeling the refractoriness following a spike. In the simplest case, the predictors will be the shared trial-average effect (TAE) f = β T α, with β being the (K β × N) matrix of the K β B-spline basis vectors and α the K α weights, the history term ; ð3aÞ where lðm; nÞ n À Lðm; nÞ 2 Z !0 , where L(m, n) is the last spike time before time n of the mth trial. From the data, we infer the parameters and the latent oscillation x, which is an AR(p) process, the F 1:p the p AR coefficients and mn a Gaussian random variable *N(0, σ 2 ), where σ 2 is the innovation variance. In what follows, we will often refer to the latent oscillation as simply oscillation. We also note that the inferred x may be spectrally broad, or have a very small amplitude, and may not be represent a signal that would commonly be called an oscillation.
The simplest oscillating AR model is of order 2 with complex eigenvalues, and may be a good model for a low dimensional oscillatory dynamical system. However fitting a model to noisy observed data utilizes noisy lagged values. The implied model for such a time series is an ARMA process, and an AR(2) would fail to capture all the structure in the observed data [24], so we approximate the ARMA process with a higher-order AR(p) process.
The spectra of AR(p) models is closely related to its p = 2C + R roots of the characteristic polynomial of F 1:p , or equivalently to the eigenvalues of the matrix F in Eq 6. The time-series decomposition [24] approach restricts the AR(p) to those with C imaginary AR(2) components and R real AR(1) components, ie only models that can accommodate the spectral features we believe are present in the data. We work through the rest of the paper using a generic structure that has proven to be a generally reasonable component structure (C = 4 and R = 1) for a wide range of simulation parameters and for real data, though model order and component structure can also be inferred from the data.

Model fitting
Gibbs sampling [28] and data augmentation [29,30] are used to infer the model parameters Θ = [F 1:p , σ 2 , α, μ, v] and the oscillation x. The spike history and TAE are functions of time relative to the last spike of an event or behavior, respectively, and they are estimated by parameterizing a subset of continuous functions with splines. The estimation requires us to choose a set of basis splines, which we choose heuristically, using some prior belief about their general shape to choose the knot locations. In the current work, the number and locations of the knots are not found with the Gibbs sampling procedure, so we briefly describe how they are determined before we describe the Gibbs sampling procedure.
Short-term spike history λ R . Neurons typically have a refractory period following an action potential, when sodium channels in the membrane become inactive and spiking is strongly suppressed. As the channels reactivate, the neuron may experience a period of rebound excitation, where spiking probability becomes enhanced, before returning to some baseline level. This effect can be modeled by a modulation of probability since the last spike. We employ cubic splines to express the refractory modulation function, with knots placed where we would like to constrain function values, where we believe the function reaches extremum values, and places in between. The interspike interval (ISI) histogram reflects combined effects of both the spiking history and oscillation, but at the smaller intervals, the refractory inhibition and rebound excitation are rapidly changing, possibly from 0 to well over 1, while a biologically plausible oscillation varies over a more limited range of values centered around 1. The shape of the ISI histogram for smaller intervals should reflect the refractory modulation function more than the oscillation, so we place history spline knots based on features of the ISI near 0. In the case the ISI histogram has a minimum at 0 followed by a maxima, we place knots at t = Δ, one knot at the first maxima, one knot at the mean of the ISIs, and knots at the 70th and 80th percentiles of the ISI distribution. We further have knots at the 97th and at 100ms set to 0, constraining the asymptotic value of the history function. Further, if the bins of the ISI at time Δ are empty, we set v 0 = −6 to prevent it from taking unconstrained negative values. If the histogram at 0 is the maximum value, we leave v 0 unconstrained. TAE spline basis β. To represent a smooth, arbitrary-shaped curve, we use the piecewise cubic B-spline as basis functions. Choosing the appropriate number and locations of basis functions is vital to represent the curve well, and too many points will result in overfitting. We determine the number and locations prior to the Gibbs procedure, by assuming that the shape of the TAE f in Eq (3a) is similar to the empirical PSTH. We randomly vary the number and locations of the spline knots β and find a set which minimizes the squared error with the PSTH under the constraint that no more than 9 knots are to be used for the data presented in this paper. The shape of the PSTH can be used as a guide to guess the rough number of knots necessary, and the constraint can be changed accordingly for the spike train under investigation.
Posterior simulation. We now introduce the posterior distribution, from which the conditional posteriors necessary for Gibbs sampling, and the generating distributions necessary for data augmentation, are derived. Our model is a state-space model with a point-process observable, and a latent oscillation evolving as an AR(p)-process. The posterior distribution of the parameters Θ is For the AR(p) model, p(x|Θ) does not have a simple expression. However, if x were a Markov process, then p(x|Θ) could be expressed recursively in terms of the initial probability at time 0. AR(p) can be made a Markov process if we define a new (N × M × p) matrix X, where X mn = (x m,n−1 , . . ., x m,n−p ) T . Eq (3b) is then where F ¼ (N × M) matrix x is the oscillation we would like to infer, and likely corresponds to oscillations of the membrane potential due to oscillating synaptic inputs [31,32]. Eq (4) then becomes pðy mn jx mn ; H n;l ; YÞ pðX mn 0 jX m;n 0 À 1 ; YÞpðX m0 jYÞdX Conditional posteriors and data augmentation for Gibbs sampling We jointly sample model parameters and latent states from the joint posterior distribution, Eq (4), using Gibbs sampling with data augmentation. The conditional posterior distribution can be read off from the full joint posterior distribution by considering all parameters fixed, except the parameter whose conditional posterior we are interested in. Our problem deviates from the standard Gibbs sampling by the missing data x in Eq (4), and we utilize the data augmentation strategy [29,30], a scheme of simplifying analysis by augmenting the observed data with missing values, to generate samples of X from the predictive distribution p(X|y, Θ) in between sampling parameters from the standard conditional posteriors. Even with samples of X in hand, the posterior distribution does not yield conditional posteriors that can be sampled easily. The recently developed Pólya-Gamma data augmentation scheme [25] allows sampling from simple Gaussian conditional posteriors at the cost of introducing the M × N matrix of Pólya-Gamma variables ω through the following identity for the spike probability: where PG(ω mn |b = 1, z = 0) is the Pólya-Gamma distribution with parameters b = 1 and z = 0, and κ y À 1 2 . With the introduction of the Pólya-Gamma variables, the joint posterior Eq (4) becomes from which we derive the conditional posteriors for the parameters and the predictive distributions for the augmented variables. We refer to the set of augmented variables as V + = {X, ω}, and denote by Θ \x to be the set of all parameters except parameter x.
Sampling μ. The conditional posterior of μ * N (M μ , S μ ) is a multivariate Gaussian, and if we constrain μ such that P M m m m ¼ 0, and set m 1 ¼ À where O mn k mn o mn À x mn À f n À l R lðm;nÞ . The covariance can be read off from the quadratic term, The mean M μ is at the minimum of Q μ , which can be found by setting Sampling v. v are the vector spline weights for the spike history, and {I H } = {1, . . ., K H } its component indices. We impose our prior belief about the timescale and shape of the spike history term by keeping some of the knots fixed, while re-weighting the remaining ones.
The mean M a of the adjustable knots is the solution of the set of linear equations whose ith The covariance can be read off from the quadratic term, and is By rearranging, we can easily calculate the mean and variance of the conditionals for α. If n or m is kept fixed in the sum, this is a quadratic form of M or N-dim vectors with a diagonal covariance matrix, and α * N (M, S), For α, ð15Þ μ α and D α are prior mean and covariances for α. Aside from the smoothness imposed by the restriction to the number of knots, we have no prior knowledge about the offsets and knot weights, so we set the means to 0 and the covariances to be broad. Sampling σ 2 . The conditional posterior distribution of the innovation variance is and is in the form of an inverse Gamma distribution. We choose a conjugate inverse Gaussian prior with parameters α σ 2 and β σ 2, and sample as As with the offset and spline weights, we have no apriori knowledge about an appropriate range of values for σ 2 . In order to minimize the effect of the prior, we choose a small values for α σ 2 (relative to the amount of data), and β σ 2, so that small values of σ 2 can also be sampled. Sampling F 1:p . Without further assumptions on the latent dynamics [33], the conditional posterior is a simple multivariate normal distribution. However, for application to spiking data, this naive approach often does not produce desired results for two main reasons. First, the constrained sampling of the coefficients to the stationary regime is inefficient, due to the nontrivial shape of the surface of the stationary manifold [24]. Second, our observations are noisy and sparse, and maximum likelihood using a flat prior may not find a state with an oscillatory character. The component decomposition [24] approach addresses both of these concerns, and allows us to specifically search for models with an oscillatory character through the imposition of an informative prior.
We seek to sample with the prior p(F 1:p ) a structural prior which restricts sampling of F 1:p to the subspace of coefficients for which the process is stationary and its characteristic polynomial has R real roots and C pairs of imaginary roots. For what follows, we drop the pairs qualifier when referring to one of the pairs of imaginary roots. Instead of sampling the entire F 1:p at once, we sample one of the C imaginary or R real roots at a time while keeping the rest of the roots fixed, via Markov Chain Monte Carlo (MCMC) as follows.
The residuals of the AR(p) process are uncorrelated Gaussians, but leaving out one of the roots results in residuals that are correlated. We now outline the case when sampling an imaginary root. Here, we consider the correlated residual to be an AR(2) process. Ignoring the trial index, x n is related to the p lagged values x n−1 , . . .x n−p through the AR coefficients and an innovation variance. Utilizing the backshift operator,Bx n ¼ x nÀ 1 , eq (3b) can be written as a polynomial inB, and the polynomial as a product of its R real roots α j , j 2 {1, . . ., R} and C pairs of complex roots g j ; g Ã j , j 2 {1, . . ., C}. With the roots calculated, the AR process, Eq (3b), can be represented as The coefficient forB i , i 2 {1, . . ., p}, is F i . Regrouping the terms, we rewrite the residuals when the contribution of the jth complex root is removed, to obtain a new time series ω jn : This equation tells us how to obtain the ω jn , and that we are treating that time series as an AR (2). We note that given a timeseries x m and some F 1:p , we could calculate the residual , and this would be the same series of N numbers had we first obtained the w j , and then calculated the N n ¼ ð1 À g jB Þð1 À g Ã jB Þw jn . Given that its eigenvalues g j ; g Ã j have corresponding angle 2π/λ j and modulus r j , which control the frequency and the steadiness of the oscillation when these parameters are used to generate a time series. Using Vieta's formula [34], the AR(2) coefficients 0 1 ¼ À ðg j þ g Ã j Þ and j are related to the frequency and modulus as 2r j cos(2π/λ j ) and À r 2 j , and The stationary regime is defined by r j 1, so in sampling from the conditional posterior, we must restrict the sampling to regions bounded by the curves This boundary describes a truncated 2-variate normal distribution. When the root in question is real, we remove its contribution to obtain the correlated residuals The real roots are can be sampled from a truncated univariate normal with mean m j and variance M j We further assume a simple informative, uniformly distributed prior of [0.97, 1] for the modulus of the slowest root. Large modulus roots are spectrally peaked, and correspond to oscillations with a relatively well-defined frequency. Our choice of a prior on the modulus is similar to adding regularization terms to the likelihood to address concerns about overfitting when data is limited, an alternative avenue recently investigated by Buesing et al [35], although this requires the structured decomposition of the AR model, which would be difficult to express in terms of regularization functions.
The latent oscillation: X data augmentation. The predictive distribution for X can be read off from Eq (4), and is The mnth observation, which through augmentation by the Pólya-Gamma variables, is no longer the spikes, but a continuous variable t mn ¼ k mn =o mn À f n À m m À l R lðm;nÞ with observation noise r mn = 1/ω mn . We generate a sample of X given observation t and fixed Θ by employing the FFBS (Forward-filter backward-sample) algorithm [30]. Briefly, FFBS is a recursive algorithm with the same forward filtering step as the Kalman Filter, but on the backward step, instead of obtaining the mean from the smoothing density, we obtain realizable sample paths of the latent state from the smoothing density. The M trials are independent, so for the sake of simple notation, we drop the trial index m in what follows. Forward filtering is done by recursive application of a prediction and filtering step. prediction filter and the Kalman gain, K mn ¼V m;njnÀ 1 H T HV m;njnÀ 1 H T þ r mn À 1 . The prediction mean is straightforward given the structure of F, while the prediction covariance iŝ The Kalman gain becomes while filter mean isX m;njn ¼X m;nÀ 1jn þ K mn ½t mn ÀX m;njnÀ 1 11 and covariance iŝ FFBS requires starting valuesX m;0j0 andV m;0j0 for the recursion, for which we choose 0 and a reasonably broad, diagonal covariance, respectively. The inferred oscillation in the begin of the trials will be of poorer quality, but improves quickly after enough spikes are observed. After the last time point is reached, the latent state is sampled backwards in time. The smoothing distribution can be written which is a product of the transition density, which is a normal distribution with mean Fx mn and variance Q, and a filter density with meanX m;njn and varianceV m;njn . Defining , the sampling distribution of X m,n is a multivariate normal with mean and covariance given bŷ X m,n+1 is the previously-sampled laten state Q can be expressed as an outer product of a column vector u = (σ, 0, 0, . . .) with itself, uu T . Using the Sherman-Morrison formula [36] for the rank-1 update of an inverse matrix, the moments of the backward-sampling densities may be written aŝ where F s X p i¼1 ðF À 1 Þ pi ðX m;nþ1 Þ i . The AR(p) model with the extended state-space is only partly dynamic [30], with rank (Q) = 1 < p, and the computations necessary for the FFBS equations are much less than would be necessary for a full-rank model, as can be seen from the calculations of various moments presented in this section.
Pólya-Gamma data augmentation. The predictive distribution for ω mn , the term under the integral marginalizing out ω mn in Eq (8), is the general PG(ω mn |b = 1, z 6 ¼ 0) distribution, obtained by an exponential tilting of the PG(ω mn |b = 1, z = 0) distribution [25]: Þ . This is sampled using the rejection sampler described in Polson et al [25].
Inferring additional latent structures. Oscillations temporally structure the spike train, but it is possible that these oscillations do not occur uniformly throughout all trials. The LOST model allows inferring additional structure present in the neural data, such as modulation of oscillatory strength within a trial, or trends in modulation strength across trials. In the simulation studies, we consider an example where the spike train is significantly modulated in only a fraction of the trials, and demonstrate that the LOST model is able to identify these trials. We also apply this to real data to show the LOST model can predict, using only spike train data, which trials are strongly modulated to a theta rhythm in the LFP, and which are not. For these applications, we assume the modulation strength takes on 2 distinct values. We fix the modulation strength of unmodulated trials to s 0 = 0.1, and parameterize the modulation strength of the modulated trials, s 1 . We introduce the latent indicator variable Z, which is an (M × 2) matrix where, for example, Z m = (1, 0) if the mth trial is an unmodulated trial, and also the parameter mixture weights π, a 2 component column vector whose components sum to 1. The likelihood is then Define M L to be all m such that Z m = (1, 0) and M H to be all m such that Z m = (0, 1). The likelihood contribution to the conditional posterior for s 1 is Gaussian, For the prior distribution for s 1 , we use p s 1 ð Þ ¼ N s 1 jm s 1 s 2 The Pólya-Gamma data augmentation analogous to Eq (37) is carried out by replacing the mnth ψ mn by The data augmentation for Z mj is done by generating from a 2-category multinomial distribution. The probability of the mth trial being in the jth category is proportional The π are sampled from a Dirichlet distribution We set the vector α π = (1, 1).

Numerical computation
Gibbs samplding was implemented in the Python programming language using the Numpy, SciPy, Matplotlib, StatsModels, and Patsy toolboxes. Numerical routines were written in C/C++ and Cython. Software for the Gibbs sampler and a Python wrapper for the Pólya-Gamma routine based on the original code by Jesse Windle, available at https://github.com/ jwindle, is available at https://github.com/AraiKensuke/PP-AR and https://github.com/ AraiKensuke/pyPG, respectively.

Results
To illustrate how the LOST model can infer latent oscillations under a variety of realistic conditions, we simulated biologically plausible oscillations, and spike trains modulated with these oscillations using either a simple inhomogeneous renewal process, or the leaky integrate-andfire (LIF) neuron model. Further, we describe another point process method to infer the CIF, the GLM method, for comparison with the LOST model.

Simulated oscillations and spike trains
The stochastic oscillation for the mth trial w m is generated by first generating a stochastic phase t m and stochastic amplitude A m as where ξ m and A m are generated by AR(1) processes, and t m0 2 [0, 1] a uniform random number. These are then used to generate the irregular oscillation C ξ and C A control the size of deviation from uniformity, while the timescale of the AR(1) processes controls the timescale of variation of amplitude and phase in the oscillation. AR(1)s with timescales on the order of the oscillation period T = 1/ν produce fluctuations that causes considerable variability of period and amplitude in w, which we quantify with the oscillator irregularity, the oscillatory coefficient of variation (OCV) of the instantaneous periods T of oscillation, defined as the ratio of the standard deviation to the mean of time intervals between phase 0 crossings of an oscillatory signal. Fig 1 shows 2 example oscillations generated by this procedure, their OCVs and their power spectral densities. Irregular oscillations have a higher OCV and a less peaked spectral density.
To generate spikes, we employ an inhomogeneous renewal process or the LIF model, driven by a stochastic model of oscillation to modulate the firing of spikes. For the inhomogeneous renewal process, for each trial m and time n, if for a uniform random number r mn 2 [0, 1], r mn < Dt e m m þw mn þl G l m;n ð Þ , we generated a spike, with l G l m;n ð Þ the user-defined ground truth refractory history function. For the LIF model, spikes were generated using where the Gaussian random variable w mn $ N 0; s 2 b À Á represents irregular arrival times of afferent spikes, τ = 0.2 the membrane time constant, μ m a baseline DC current and s 2 b ¼ 210 the amplitude of background fluctuation. Spikes are elicited when V mn ! 1 passes the threshold value of 1, which then causes a reset to V m,n+1 = 0. LIF have a refractory period that depends on the model parameters, and therefore the CIF naturally is dependent on the last spike time.
LIF neuron with stimulus-triggered change in firing rate. We demonstrate the basic operation of LOST and the inference of the TAE, spike history and oscillation. In the following, we monitor a small subset of the full joint posterior parameters: the modulus r, frequency 2π/λ of the lowest AR(2) component, and the latent state amplitude defined by its standard deviation. Samples of trial offsets or the TAE knots approach their stationary distribution values much more quickly than these parameters representative of the latent state, so monitoring this smaller subset suffices for determining when the posterior LOST draws from has reached its stationary distribution. Fig 2A shows a clear oscillation being inferred, as the parameters show little uncertainty. We took the mean of the last 2000 iterations to calculate the means in Fig 2B, and the oscillation in Fig 2C. In most of the following figures, we use the convention that cyan represents sampled values themselves, their distribution or 5-95% intervals of the sampled values. Red lines represent means of sampled values, and black lines represent ground truth values.
Model identifiability and interpreting the results of LOST. Oscillations and post-spike refractory periods both add temporal structure to spike trains. Because they are presumably separate biological mechanisms and modify the spike train in distinct ways, we model them separately using the latent state and the spike history term. Without imposing constraints on Inferring oscillatory modulation in neural spike trains these two terms, the latent state and spike history cannot, in principle be separated. LOST imposes constraints on the latent state and spike history according to our prior beliefs about their properties. The knot locations impose a strong prior on the shapes the spike history can take, so misplacing the knots may strongly affect the inferred latent state, presenting a significant difficulty in interpreting the results. We therefore investigate how mis-specifying the history affects LOST in Fig 3, which shows typical results from mis-specifying the spike history with two simulated datasets. The datasets are spike trains generated by a renewal process, Fig  3A with Fig 3C, as the pronounced silences immediately following spikes are increasingly unlikely under spike train models with the latent state alone, which tends to increase the firing rate immediately following a spike. The latent states take large amplitudes near the spiking frequency of 40Hz when a temporally-compressed version of the ground truth spike history is used, Fig 3D. While the frequency has certainty, the modulus and amplitudes in contrast have large uncertainty, lowering our confidence that this latent state represents an oscillatory modulation. It is likely that a wide range of latent model parameters are nearly interchangeable, leading to a widening of the posterior in the direction of the AR parameters. Stretching the spike history, Fig 3E, results in a similar inferred latent state as Fig 3B, so a mis-specified history does not always lead to spurious results. We analyzed the second spike train 3 times with LOST, twice with an adjustable spike history term but with different history knot locations, and once with a fixed spike history term set to the ground truth, and Fig 3G- where knots were placed farther out than in Fig 3G, and the amplitude also approaches 0. We conclude that if the post-spike inhibition is not well characterized, the inference of an oscillation will most likely fail. If the inhibition is well characterized but the rest of the history is not, we find either that an oscillation will still be inferred successfully, or that the posterior will be broad in some of the directions of the AR parameters, making it difficult to conclusively determine that the latent state represents an oscillation.
We now look at spike trains without oscillational modulation, but generated with various post-spike refractory profiles to check that certain refractory profiles are not confounded as oscillations. Fig 4A-4D and 4E-4H are spike trains with average firing rate around 34Hz and 60Hz, respectively. In all cases except Fig 4C and 4F, the latent state amplitude approaches 0 as expected. For Fig 4C and 4F, the amplitude does not approach 0 but fluctuates widely along with the other parameters. In these cases, the large uncertainty in the AR parameters suggests the results of LOST are ambiguous, and we refrain from drawing a conclusion. We find that a reasonable rule of thumb in assessing how much uncertainty is "too much" is that frequency 2π/λ has a standard deviation of around 10% of its mean or less, and that modulus r have a standard deviation of under 0.005. Typically, we also find slow trends over a few thousand Gibbs iterations in the sampled values is commonly observed, consistent with the view that the parameters are nearly interchangeable over a fairly broad region of parameter space and the Gibbs sampler then performs a near random walk over the parameters. A latent state with a In B-E and G-I, we show the 3 representative sampled quantities, frequency and modulus of the slowest root and the amplitude of the inferred latent state on the left, and the ISI distribution and spike history term on the right. The ground truth, sampled mean and 5-95% intervals of the sampled spike history term plotted in black, red and cyan, respectively. The differences between the different LOST runs is in how spike history knots were chosen, or if a fixed spike history function was used. For the first spike train A, B uses the standard knot choices, C uses a constant history set to 0 and D, a fixed spike history set to a time-compressed ground-truth history. For the second spike train, G uses the standard knot choice, H the ground truth and I the standard knots moved farther out in time. For the first spike train, misspecification of the history may lead to a false inference of a flat latent state, a latent state with high uncertainty in its parameters or the correct latent state. For the second spike train, a misspecified history can lead to a latent state with significant amplitude but with uncertain parameters, making the inference inconclusive. https://doi.org/10.1371/journal.pcbi.1005596.g003 Inferring oscillatory modulation in neural spike trains large amplitude does not necessarily signify the successful inference of an oscillation, and the uncertainty in the model parameters of the oscillation must also be taken in account in deciding whether to reject the hypothesis that there is an oscillation modulating the spiking. The amplitude of the latent state can inform us of how large the modulation is surrounding the mean firing rate. A latent state amplitude of 0.15 equates to a peak-to-peak fluctuation of about 30% of the average firing rate (for low firing rates < 100Hz), which is quite small a fluctuation considering only a few spikes may be observable for an oscillational cycle, so we use a latent state amplitude of 0.15 as a rule of thumb for when to consider the latent state as practically flat.
Through our examples, we found that the latent state can partially compensate an incorrectly specified spike history, as long as the short-term inhibition is reasonably well characterized by the spike history, or the post-spike inhibition is not too pronounced or long. In the absence of oscillational modulation in the data, this causes the latent state to take on significant amplitude, but the AR parameters themselves also have large uncertainty, making it difficult to conclude that the latent state represents an oscillation. However, in one limiting case, even with a mischaracterized spike history, LOST infers a clear oscillation. A nearly periodic spike train in Fig 5A is shown analyzed with LOST using the adjustable spike history term, Fig 5B  and a fixed but modified ground truth spike history term, Fig 5C. When the misspecified history is used, the latent state takes large amplitudes with the posterior distribution of AR parameters having little uncertainty. The frequency of oscillation is nearly the same as the spiking frequency, 62Hz. In the limit of nearly periodic spikes, it may be difficult to conclude what particular combination of latent state and spike history underlies the highly regular spiking.
A flat latent state may signify a lack of oscillatory modulation or a spike history term that poorly captures the post-spike inhibition, or it may also signify that there is not enough data to infer an oscillation. Two spike trains with oscillatory modulation and different refractory profiles are shown in Fig 6A and 6E, with differing amount of trials used in LOST analysis, Fig  6B-6D and 6F-6H. When there is insufficient data to detect an oscillation, the latent state Inferring oscillatory modulation in neural spike trains amplitude approaches 0, Fig 6B and 6F. In general, the uncertainty in the parameters of the oscillation will decrease as more trials and data are used, Fig 6C, 6D, 6G and 6H, as expected.
Model comparison. We compare the LOST model to another point process model whose CIF is a general linear model (GLM) of past spiking history [12,18]. Briefly, the CIF for trial m Fig 5. A) limiting case where models are not identifiable. The spike train in A is nearly periodic, but generated without an oscillatory modulation. When using the standard knots, the latent state amplitude approaches 0, B, but when we use a fixed history that is very different than the ground truth, C, the sampled parameters appear to be very certain, and an oscillation at 62Hz, nearly the spiking frequency, 64hz, is inferred. For cases where the spike train itself is nearly periodic, the spike train may be described equally well with various combinations of spike histories and an oscillation at the spiking frequency. https://doi.org/10.1371/journal.pcbi.1005596.g005 and this is easily fit to obtain the " a and " b coefficients. The " a models the short-term refractory self-history and " b encapsulates longer-term oscillatory self-history. This model does not take into account the dynamics of oscillation. Rather the spiking activity up to S(O + 1) steps in the past uniquely determine the firing probability at the present, in contrast to our model. We fit the model several times with different initial conditions and values of S and L, using the python function statsmodels.api.GLM, and compare their goodness of fit via the time-rescaling theorem [37], and find the best combination of S and O. We then calculate the CIF from that model. Interpreting the fluctuating CIF as an oscillation, we extract the instantaneous phases, and compare to the phases inferred by the LOST model. The GLM model does not separate out the contribution of the short-term refractory self-history and the longer-term oscillatory self-history to the CIF, so we attempted to extract the oscillatory contribution by low-pass filtering the obtained CIF. We refer to this method of inference as GLM hereafter.
We compare the quality of inference of different models by comparing the resultant length of their inference with the ground truth. The resultant length (RL) L is calculated as where the ψ (ϕ) are the ground truth (inferred) phases, and is the circular equivalent to the mean squared error (MSE) of the ground truth and inferred phase. We also use another simple measure of characterizing the strength of oscillatory modulation, the spike phase (SP) histogram and its circular statistic R, calculated as Re iF ¼ 1 N P N n¼1 e i2p0 t n , where t n is the time of the nth spike. Here R characterizes non-uniformity in the distribution of 0 t n . While they are calculated in a similar fashion, the RL directly compares the inferred phase with the ground truth or LFP, while the SP R measures the uniformity of the probability of spiking with respect to ground truth or LFP phase. We caution that a result L < R does not mean that the LOST model performed inference poorly. SP histogram detects modulation, but knowledge that spiking occurs at preferred phases does not also mean we know how to assign instantaneous phases given the spikes in a spike train. The error on Rs and Ls is calculated by bootstrap sampling with replacement of trials.
Comparison to other models of oscillation. Comparison of the LOST model with the GLM model illustrates how point process noise and oscillatory irregularity pose challenges to inferring an oscillation from a spike train, and how explicitly modeling the dynamics of the oscillation using a latent state with a point process observation model, improves inference when posed with these challenges. Fig 7A and 7B show the dependence of RL on how much spike history is used in the GLM (bottom right), and the history weights b when 180, 480 or 720ms of history is used. The oscillation OCVs were 0.02 and 0.2 for Fig 7A and 7B, respectively. Temporal phase relationship is preserved for large time lags, as can be seen in the clear periodicity of the history weights b in Fig 7A, and the GLM model performs as well as the LOST model when long history dependence is used. Shorter history degrades the performance of the GLM model, because we are not averaging out the spiking noise as much when shorter history is used. However, Fig 7B shows that when the oscillation is irregular, the phase information quickly dissipates, as seen by the lack of periodicity in b at longer lags. Consequently, the spiking noise cannot be averaged out using long history, but rather spikes with large lag contribute only noise as the oscillational irregularity washes out the phase relationship for longer lags. The LOST model suffers less because it models the statistics of the irregularity in oscillation, the observable of the latent oscillation has a point-process observation model that takes into account spiking noise, and also because the inferred oscillation is conditioned on all the spikes in the trial, Fig 7C. AR order. Huerta [24] describes a reversible jump MCMC procedure to choose an appropriate AR model strucure. We did not implement this due to the complexity of implementation and the relatively high computational cost of the current algorithm. Nonetheless, we investigate the effect of varying the AR order for phase inference with the LOST model. In Fig  7D, we used 4 sets of data with increasing firing rates, and compared the RLs of inference with different AR orders, and the accompanying RL from inference using GLM. The data sets themselves are generated with an oscillation of 10Hz using the LIF model, with OCVs of around 0.2, and have mean firing rates of 11Hz, 15Hz, 21Hz and 33Hz. 140, 130, 100, and 90 trials of 1.2 seconds duration were used in the fitting of the data. The model order comparison in Fig  7D shows that including more components usually resulted in a better inference of instantaneous phase, but qualitatively, even few components can pick out the oscillatory structure present in the spike train. This suggests that a model with the correct order, while important for optimal inference, can still infer valuable information if a sub-optimal order is chosen. For the rest of the paper, we keep the model structure to 4C + 1R, unless otherwise stated, as it seems to provide a good compromise between good inference and reasonable computation times. Inferring oscillatory modulation in neural spike trains Presence of additional, slower fluctuation. Periods of prolonged silence, followed by prolonged periods of elevated firing rates are often visibly apparent in spike trains [38]. The timescales of these readily visible features are much slower than those characterizing faster neuronal oscillations thought to be relevant to cortical processing. In this example, we seek to demonstrate that the LOST model will give us useful inference of the faster oscillation in the presence of these slow modulations. We generated data with the offset, instead of being constant throughout a trial, to be a randomly generated square wave with a given duty cycle, and state duration generated from an exponential distribution with a minimum duration of 100 ms. Because the square wave is different in each trial, the overall PSTHs are flat. The spike train in Fig 8 have mean firing rates of 22Hz, a 20Hz oscillatory modulation with OCV = 0.27, with 200 trials used, and a duty cycle of 50% for the slow square wave. The resultant length was L = 0.25 ± 0.01. To analyze the data, we begin by inferring the slow time modulation using an AR(1) model as our latent model. We choose an AR(1) model, as we don't expect very much structure in the slow latent state, and also because the time scale of the square waves relative to the duration of each trial ensures that any potential slow oscillatory structure would only be partially sampled, that considering additional structure in the AR is unnecessary. Although an AR(1) model is not the correct model describing these square waves, we are able to roughly infer the shape of the square waves, Fig 8 top.
We account for the slower fluctuation by modifying the CIF, Eq 3a to include the inferred slow fluctuation as a known signal I mn , Neural data often contains more structure than the data analyst is aware of or suspects, and if on initial analysis with the standard LOST the analyst finds a much slower timescale latent state than expected, chances are the user has discovered a real but unknown feature of the data, and that the user may want to infer this slow latent state using an AR(1) as described in this section, and include this signal as a known signal in a second run of LOST to discover the higher-frequency modulation expected.
Mixture of modulated and non-modulated trials. Here we show an example where additional, discrete latent structures beside the oscillation, are present in the data. We motivate this extension to LOST as recent work has shown that brain states during sensory processing and behavior undergo discrete state transitions [39,40], and because the authors have suspected (unpublished) that the motor cortical neurons analyzed in the Data Analysis section, also exhibit discrete transitions from trial to trial. Fig 9 shows results from 3 simulated spike trains, where 0% (Fig 9A), 25% (Fig 9B) or 40% (Fig 9C) of the of the trials are weakly modulated, and the rest strongly modulated. The mean firing rate for all 3 of the spiketrains are 36Hz, modulated with a 20Hz modulation with OCVs of 0.32. 80 trials were analyzed, each of duration 1.2 seconds.
The LOST model successfully infers the existence of 2 subsets of trials whose modulation strengths are very different in Fig 9B and 9C, and correctly infers that all trials are modulated by similar strengths in Fig 9A. To infer the modulation state, weak or strong, we compiled the posterior mean of the indicators hZ m1 i for all trials m. Trials that are strongly modulated tend to have small values of hZ m1 i. The mixture coefficient π provides a straightforward threshold for hZ m1 i for categorization into weak or strong trials. We choose the threshold to be Mp Ã 1 , where p Ã 1 is the posterior mode of π 1 . The trials whose hZ m1 i are larger than the threshold are deemed weakly modulated trials. As can be seen in Fig 9B and 9C, the LOST model categorizes a significant fraction, > 80%, of trials correctly, and also infers the mixture weights close to the true values. We note that in inferring the value of π 1 , we did not take the mean as we do for most other parameters due to its bounded nature and the bias this confers when π 1 is at or near 0, but rather take the mode of the posterior.

Data analysis
We analyze neurons from M1 of rat performing a self-paced lever push-hold-pull task [13,41] that have been found to be significantly modulated to the theta rhythm in the LFP. These 2 neurons were recorded on separate electrodes, with "neuron 1" in Figs 10 and 11 recorded and spike sorted from a tetrode on a siliconprobe, and "neuron 2" recorded using juxtacellular (cell-attached) recording, where the spike trains did not need to be sorted. Both neurons were located in layer 1/2, and "neuron 2" identified morphologically as an interneuron, while "neuron 1" is likely to be an interneuron based on spike shape. We identified trials by selecting lever hold periods lasting more than 1 second followed by a large-amplitude pull, and analyzed the 1.2 second period encompassing the hold-pull period.
Slow fluctuations. Fig 10A shows that the spike trains appear to have long periods of low firing rate, interspersed with periods of higher firing within a trial, creating noticeable holes in the spike train. Applying the LOST model using an AR(1) latent state first picks up a latent state that follows these slow fluctuations to the firing rate, but with the faster oscillatory modulation absent. Fig 10B and 10C bottom show the power spectrum for the fluctuation I we infer for each neuron, and find that both are slow, in the 1-2Hz range. Interestingly, this slow fluctuation is weakly correlated between the neurons, p < 6 × 10 −3 , bootstrap test.
Shared theta oscillations. Next, we account for the faster fluctuation by including the slow fluctuation I into the CIF as described in Eq 47. After incorporating this known signal, we are able to infer the phases, Fig 11 for both neurons. The power spectral density of the inferred latent oscillation are similar, with a peak around 7-8Hz, Fig 11A and 11B bottom. Igarashi et al [13] found significant modulation to the LFP theta of these neurons using SP histograms, but did not find direct evidence of simultaneous modulation of the neurons from the spike-spike cross-correlation function, Fig 11D. In contrast, the oscillatory cross-correlation function of the independently inferred oscillations shown in Fig 11C, clearly shows evidence of simultaneous modulation of the neurons to a shared oscillation.
Weakly and strongly modulated trials to LFP theta in a single neuron. In the previous section, we treated all trials of neuron 1 as equally modulated and inferred a theta modulation in its spike train. Using a simple model to fit often results in finding an oscillation, as we did for neuron 1, but it may be that there is further structure that can be discovered by using a more complicated parametric model. Here, we analyze neuron 1 using the mixture model, and find a significant difference in the strength of oscillatory modulation between trials. Following the procedure described for the mixture model, we find that about 85% of the trials (H, pink) have a much stronger modulation, Fig 12A right, than the remaining trials (L, blue), Fig 12C  top.
We emphasize this classification was made using only the spike trains. The PSTHs constructed separately from L trials and H trials show no significant difference Fig 12B left, so the L and H trials being detected do not reflect a difference in temporal trial-locked firing rate, or in the baseline firing rate. Since the classification into L and H reflects a dichotomous change in the structure of oscillatory modulation, we expect to find a similar change in the relationship between the spike train and the LFP theta. It is possible that the L and H trials are indicative of absence or presence of the global LFP theta, that L trials show weak modulation because there is no modulating drive in those trials. The autocorrelation function of the high-pass filtered (above 1Hz) LFP constructed separately for the L trials and H trials, showed similar oscillatory structure, Fig 12B center. Further, the amplitude of the theta-band LFP in L trials and H trials also show no significant difference, Fig 12C, so global theta seems to be a reliable and perpetual presence. However, in Fig 12D, when we construct 2 separate LFP theta SP histograms, SP L and SP H , we find that SP L is significantly more uniform than SP H , with R L = 0.06 ± 0.01 and R H = 0.16 ± 0.01, bootstrap test p < 7 × 10 −4 . Thus it appears that this neuron intermittently participates in the perpetual theta rhythmicity, resulting in distinct subsets of trials, some being weakly modulated and others being strongly modulated.

Discussion
We have defined the LOST model, together with accompanying posterior simulation technology, in order to detect the presence of oscillatory firing-rate modulation in a spike train, and infer its phase of oscillation in the presence of a variety of non-stationarities in firing rate that are present in experimental data. Previous methods have assessed the oscillatory content in spike trains by comparing the spiking to a known oscillatory signal like a band-passed LFP [42,43], by detecting oscillation directly from the spike train [12,18,[44][45][46], or by point process regression using the oscillatory signal as a covariate [47]. The LOST model not only detects oscillatory modulation from the spike train itself, but does so in the presence of both spiking noise and oscillatory irregularity, and also allows extensions to inferring additional latent structure, such as non-stationarities in the modulational strength across trials. In addition, LOST is able to separately account for modulational signals of different frequency band, which has proven to be vital in the analysis of real spike trains. Structural priors on the latent state dynamics together with explicit consideration of spiking noise and oscillatory irregularity allows LOST to uncover oscillatory structure even when the firing rate is low, the modulation weak or when the oscillation is irregular, compared to methods that directly regressed the spiking probability on the raw spiking history itself without an intermediary latent state [12,18,19]. LOST uses the intermediary latent state to addresses the irregularity characteristic of neural oscillations. Another approach to modeling the spectral features of time series using Gaussian processes has recently been developed by Wilson et al [48], where different classes of covariance kernels, the SE and SM, respectively, model non-oscillatory and oscillatory structures, analogous to the real and imaginary roots of the characteristic polynomials. It would be interesting to compare the two approaches in future work.
The increased sensitivity and flexibility in specifying latent structures, should allow LOST to be used in investigating the role of oscillations in cognitive functions in the cortex. Investigators are increasingly interested in characterizing the change in neural responses to time- varying stimulus or behavior [49,50]. Theta and gamma oscillations in hippocampus change their coupling structure and prevalence during learning and memory acquisition [51,52]. The inclusion of trial-specific structure independently of the LFP in the LOST model may also allow detection of increased recruitment of a given neuron into cell assemblies during periods when oscillations in the LFP are changing. Further, the variability of the timing and presence of oscillations in the LFP seen in many areas of the cortex and hippocampus [2,13,53], suggests oscillatory modulation in single neurons may likewise exhibit finer structure on a pertrial basis. Use of the LOST model in the analysis of such systems may reveal richer dynamics of recruitment into cell assemblies, and a better understanding of the role of single neurons in cognition.