Modeling functional cell types in spike train data

A major goal of computational neuroscience is to build accurate models of the activity of neurons that can be used to interpret their function in circuits. Here, we explore using functional cell types to refine single-cell models by grouping them into functionally relevant classes. Formally, we define a hierarchical generative model for cell types, single-cell parameters, and neural responses, and then derive an expectation-maximization algorithm with variational inference that maximizes the likelihood of the neural recordings. We apply this “simultaneous” method to estimate cell types and fit single-cell models from simulated data, and find that it accurately recovers the ground truth parameters. We then apply our approach to in vitro neural recordings from neurons in mouse primary visual cortex, and find that it yields improved prediction of single-cell activity. We demonstrate that the discovered cell-type clusters are well separated and generalizable, and thus amenable to interpretation. We then compare discovered cluster memberships with locational, morphological, and transcriptomic data. Our findings reveal the potential to improve models of neural responses by explicitly allowing for shared functional properties across neurons.


Main concern
My only major issue is the model selection and how you end up with 'simultaneous case A' as the preferred model.Based on the best BIC values in Figs 4A,C and S3A,C the four models ('simultaneous case A', 'sequential case A', 'simultaneous case B', 'sequential case B') perform equally well (for their respective optimal K).Maybe one shouldn't just look at BIC, and in (the variance in) β stim and yields clusters with small Σ self .Clustering the full β yields different cluster assignments with bigger Σ self (but smaller √ Σ stim ), in particular for those clusters with large number of neurons.
It would be worthwhile to add a supplementary figure showing how well the cluster assignments of the 4 models agree.Plot the binary N × K matrices of the cluster assignments (or real soft assignments if you wish) using the same ordering of neurons obtained from the 'simultaneous case A' model.Fig 6 and S5 try to do that to some extend, but it is hard to judge the similarity given the ordering.Cluster #0 is apparently the same for all four panels.However, cluster #1 in 6A corresponds to #4 in 6B, #2 in S5A, and #7 in S5B...

Minor comments
page 2, Introduction: I was bit tripped up by the first sentence.In the spirit of George Box, while the model will be wrong, it should not only be tractable but useful.I'd add something like ...allow for tractable modeling, while still capturing relevant aspects of ... page 4, section 2: You did not "develop" a Bayesian information criterion, you use the standard one (e.g.wikipedia) but for unclear reasons multiply it by − 1 2 .page 5: Why is it ok to use your cross-validation scheme (i.e.there is no concern of data leakage) in spite of dealing with time-series data?page 6, Eq (5): The constraint k π k = 1 is missing.page 6, Algorithm 1: Wrong signs in front of the 2 -regularization terms.Has to be − not +, as in Eq (3), cause you are maximizing not minimizing.page 8, E-step: What was the reason to use the Laplace approximation with diagonal covariances?Wouldn't mean field variational inference, as e.g.Blei et al. do for the GMM in "Variational Inference: A review for Statisticians", yield a better approximation?page 10, Alg 2: Shouldn't the update for c −1 i,k be diag (∇ 2 log P ) −1 −1 , as in the supplementary material?Diagonalize the covariance, not the precision matrix?page 10, Algorithm 2: How do you initialize Ωk for the non-convex GMM problem? page 12, section 3.1: typo, 10 − 5/6 should be 10 GitHub: I applaud you for making the code to reproduce the analysis available.However, some scripts won't run through, e.g.N in data grabber.py is not defined yielding a NameError.Please add a license to the repo, otherwise (see https://choosealicense.com/no-permission/) only you as creator are permitted to use the software.
Figure 1: A sketched hypothetical case where clustering the data points using β self yields 12 clusters with small Σ self and clustering β yields 19 clusters with bigger Σ self .

page 6 ,
Algorithm 1: How do you initialize Ωk for the non-convex GMM problem? page 7, ∼middle: As already stated, please plot β stim as additional panels in Fig S3 to support that you can indeed "assume that β stim i and β 0 i are independent of the cell type k".page 7, before Eq (11): Strictly speaking, spiking activity of each neuron is independent conditioned on the stimulus.page 8, Fig 1B: Wrong symbols for spiking response (x should be y) and stimulus (s should be x).
According to panel C, the correct K for the sequential method is K = 19.How would panel D look without forced merging of clusters, i.e. for K = 19?page 16, section 3.2.2,2nd paragraph: "Many metadata labels belonged to certain clusters discovered by the sequential method..." ⇒ "Many metadata labels belonged to certain clusters discovered by the simultaneous method..." Some references are now available not only as preprints but as (hopefully improved) peer-reviewed articles.page 21: ref 1 has been published in Nature Neuroscience, ref 13 in Neural Computation.page 22: ref 18 has been published in NeurIPS (I know, not really a journal), ref 19 in Cell, and ref 26 in Nature Neuroscience.