Transcriptomic correlates of electrophysiological and morphological diversity within and across neuron types

In order to further our understanding of how gene expression contributes to key functional properties of neurons, we combined publicly accessible gene expression, electrophysiology, and morphology measurements to identify cross-cell type correlations between these data modalities. Building on our previous work using a similar approach, we distinguished between correlations which were “class-driven,” meaning those that could be explained by differences between excitatory and inhibitory cell classes, and those that reflected graded phenotypic differences within classes. Taking cell class identity into account increased the degree to which our results replicated in an independent dataset as well as their correspondence with known modes of ion channel function based on the literature. We also found a smaller set of genes whose relationships to electrophysiological or morphological properties appear to be specific to either excitatory or inhibitory cell types. Next, using data from Patch-seq experiments, allowing simultaneous single-cell characterization of gene expression and electrophysiology, we found that some of the gene-property correlations observed across cell types were further predictive of within-cell type heterogeneity. In summary, we have identified a number of relationships between gene expression, electrophysiology, and morphology that provide testable hypotheses for future studies. Author Summary The behavior of neurons is governed by their electrical properties, for example how readily they respond to a stimulus or at what rate they are able to send signals. Additionally, neurons come in different shapes and sizes, and their shape defines how they can form connections with specific partners and thus function within the complete circuit. We know that these properties are governed by genes, acting acutely or during development, but we do not know which specific genes underlie many of these properties. Understanding how gene expression changes the properties of neurons will help in advancing our overall understanding of how neurons, and ultimately brains, function. This can in turn help to identify potential treatments for brain-related diseases. In this work, we aimed to identify genes whose expression showed a relationship with the electrical properties and shape measurements of different types of neurons. While our analysis does not identify causal relationships, our findings provide testable predictions for future research.

driven," meaning those that could be explained by differences between excitatory and inhibitory cell classes, and those that reflected graded phenotypic differences within classes. Taking cell class identity into account increased the degree to which our results replicated in an independent dataset as well as their correspondence with known modes of ion channel function based on the literature. We also found a smaller set of genes whose relationships to electrophysiological or morphological properties appear to be specific to either excitatory or inhibitory cell types. Next, using data from Patch-seq experiments, allowing simultaneous single-cell characterization of gene expression and electrophysiology, we found that some of the gene-property correlations observed across cell types were further predictive of withincell type heterogeneity. In summary, we have identified a number of relationships between gene expression, electrophysiology, and morphology that provide testable hypotheses for future studies.

Author Summary
The behavior of neurons is governed by their electrical properties, for example how readily they respond to a stimulus or at what rate they are able to send signals. Additionally, neurons come in different shapes and sizes, and their shape defines how they can form connections with specific partners and thus function within the complete circuit. We know that these properties are governed by genes, acting acutely or during development, but we do not know which specific genes underlie many of these properties. Understanding how gene expression changes the properties of neurons will help in advancing our overall understanding of how neurons, and ultimately brains, function. This can in turn help to identify potential treatments for brain-related diseases. In this work, we aimed to identify genes whose expression showed a relationship with the electrical properties and shape measurements of different types of neurons. While our analysis does not identify causal relationships, our findings provide testable predictions for future research.

Introduction
Two prominent features that distinguish neurons from other cells are their electrical activity and their characteristic morphology. The specific pattern of electrophysiological activity displayed by a given neuron is a core property of its identity as one type of neuron or another. Similarly, different cell types often show striking differences in their size, branching complexity, and other morphological features.
Neuronal cell types defined according to their electrophysiological or morphological characteristics show substantial correspondence with one another as well as with those defined using classification schemes based on transcriptomic criteria (1). Electrophysiological characteristics of neurons, as well as their connectivity patterns, give rise to the computational properties of a given circuit (2,3). Additionally, modeling studies show that morphological changes in simulated neurons can critically change their signaling capabilities (4)(5)(6). Thus, understanding the origins of neuronal electrophysiology and morphology is an important step in understanding the mechanisms of brain function, both in the context of basic research and in the search for treatments for neuropsychiatric disorders.
A comprehensive understanding of the mechanisms that give rise to electrophysiological or morphological diversity must necessarily include a catalogue of the genes whose products contribute to these properties. Many genes have been shown experimentally to influence neuronal electrophysiology through a variety of mechanisms, including but not limited to ion channel activity, protein trafficking, and transcription factor activity (7)(8)(9). Processes such as axon guidance and the development of dendrite morphology are also known to be under genetic control (10). Despite this, our understanding of the relationship between gene expression and electrophysiological or morphological properties is quite limited.
In previous work (11), we combined publicly accessible electrophysiological and gene expression datasets in order to examine the relationship between gene expression and electrophysiological properties. By matching groups of cells inferred to be similar based on multiple information sources, such as the transgenic reporter line and the brain region cells were isolated from, we were able to combine separate datasets containing gene expression and electrophysiological data to generate lists of genes which were correlated with one of several electrophysiological properties (as outlined in Fig 1A).
The goal of this approach was to identify candidate genes that could be further studied using knockout or knockdown approaches in order to determine whether a causal relationship was present.
One caveat in our prior study is that the gene-electrophysiology correlations we identified may have been confounded by overall differences between broad cell classes. Across multiple datasets and cellular characterization methods, including gene expression (12)(13)(14)(15), and electrophysiology and morphology (1), clustering cellular phenotypes in an unbiased manner reveals the major taxonomic difference between neurons to be between projecting and non-projecting neurons (13), or in the case of those cell types present in the cortex or hippocampus, excitatory and inhibitory neurons (12,14,15). Thus, the commonly held view that a neuron's identity is first and foremost defined by its excitatory or inhibitory identity (16) is corroborated across multiple data sources and experimental modalities.
Therefore, we reasoned that the dataset we used previously was potentially susceptible to this confounding effect of cell class, since it contained a mixture of cells from different broad cell classes. In this work, we will use the term "cell type" to refer to narrowly-defined cell types, and "cell class" to refer to those which are broadly-defined (excitatory versus inhibitory or projecting versus nonprojecting). We refer to correlations between gene expression and electrophysiological or morphological properties that are explained by differences between cell classes as "class-driven," (e.g. Fig 1B) and to those that exist based on graded differences within broad cell classes as "non-class-driven" (e.g. Fig 1C).

Primary Dataset
The primary dataset we used combined groups of cells from mouse visual cortex characterized by the Allen Institute for Brain Science (AIBS; http://celltypes.brain-map.org/), where multiple Cre-driver lines were used to target cells for characterization. Standard electrophysiological protocols were used to characterize cells in vitro, with a subset of these cells further undergoing detailed morphological characterization (1). In addition, a separate group of cells were subjected to deep single-cell RNAsequencing to characterize cellular transcriptomes (14). Because the same Cre-lines were used to characterize cells along multiple modalities of neuronal function, we were able to summarize these data to the "cell type" level (reflecting Cre-line, cortical layer, and major neurotransmitter; shown in Table   S1) by pooling and combining cellular characterization data across different animals and data modalities. The definition of multiple cell types within one Cre-line based on cortical layer and major neurotransmitter is supported by cross-layer differences in gene expression (14) and in electrophysiological properties (Fig S1).
The final combined dataset is composed of 34 inhibitory GABAergic and 14 excitatory glutamatergic types (48 total) with electrophysiological data, and 30 inhibitory and 13 excitatory types (43 total) with morphological data. The increased size of this dataset is a considerable advance over our prior analysis (11), which employed an older version of the same dataset (only 12 cell types) (15). This was made possible in part because of more Cre-lines available for analysis and finer cortical layer dissections for the transcriptomic data. For each cell type thus defined, we computed the mean expression value for each gene represented in the RNA-seq dataset and the mean value of each of sixteen electrophysiological and six morphological properties (described in Table S2).

Analysis Approach
Our goal was to identify, for each electrophysiological or morphological property, genes that were correlated with the property (Fig 1A). However, overall differences between excitatory and inhibitory cell classes can make the interpretation of such relationships more complicated in several ways. For example, Fig 1B shows an example of a gene-property correlation that appears almost entirely classdriven, meaning that although no relationship appears within either cell class, the apparent relationship is entirely driven by differences between cell classes. In this case, inhibitory cell types show higher expression of the gene and a greater value of the property compared to excitatory cell types. In contrast, Fig 1C shows a non-class-driven relationship, meaning one that manifests in both cell classes, but which may be obscured by baseline differences when the cell classes are grouped. In this example, a correlation that appears within both classes independently is obscured by a higher value of the property in inhibitory compared to excitatory cell types. Although this obscuring effect is present in this particular example, it is not required for a relationship to be considered non-class-driven; we expected to see some relationships that were consistent both within each class as well as among all cell types.
In order to computationally account for these possibilities, we evaluated each combination of gene and property using a statistical model that assesses the predictive value of the gene on the property while controlling for the effects of cell class. We termed this model the class-conditional model. This model would be expected to identify a significant result when a non-class-driven relationship is present ( Fig   1C), but would not identify relationships that are class-driven ( Fig 1B). For comparison, we modeled the same gene-property pairs using a class-independent model, which assesses the predictive value of the gene on the property irrespective of cell class. This model is similar in principle to the correlational method used in our previous work (11) and would be expected to produce a significant result in cases showing class-driven relationships (such as Fig 1B) but might miss some instances of non-class-driven relationships (such as Fig 1C).
Another possible gene-property relationship is one where there is an interaction between gene and class, meaning that the gene-property relationship is different in excitatory and inhibitory cell types. An interaction could indicate either that excitatory and inhibitory cell types both show a correlation between the gene and property, but the slopes are in opposite directions (as in the example in Fig 1D), or that the gene is correlated with the property only in one cell class. To detect such situations, we introduced a third model, the interaction model, which tested whether the relationship between gene expression and the property in question was significantly different between excitatory and inhibitory cell types. In Accounting for cell class results in the identification of a distinct but overlapping set of genes We first set out to understand how accounting for cell class identity (excitatory or inhibitory) affects the interpretation of gene-property relationships. We modeled each relationship with or without including an indicator variable for cell class, using the class-conditional or class-independent models described above. For most properties, we found that the degree of overlap between the sets of genes identified in the two models (at a false discovery rate (FDR) < 0.1) was substantial but far from a complete intersection (Fig 2A, Venn diagrams, and Table S2). For example, for after-hyperpolarization (AHP) amplitude, we found ~6000 significantly-associated genes in the class-independent model and ~6500 in the class-conditional model; out of these, ~3700 genes were shared between models. Thus, accounting for cell class results in the identification of a substantially different set of candidate genes, which suggests that many of the genes identified in our previous work (11) might reflect class-driven geneproperty relationships.
We next asked how overall differences in morphological and electrophysiological properties between excitatory and inhibitory cells affect gene-property relationships. To this end, we used a linear model to estimate the effect of cell class on each property. For most properties, there was a significant (p < 0.05) effect of cell class. The features of action potential (AP) threshold, input resistance, sag, rheobase, branchiness, soma surface, and bifurcation angle are exceptions to this. The existence of a significant difference in most properties between excitatory and inhibitory cell types highlights the importance of taking cell class into account when attempting to relate these properties to gene expression. The properties without a significant difference are likely to be less susceptible to class-driven effects, but the class-independent model still might miss potentially interesting relationships due to differences in gene expression between classes, resulting in genes which are identified by the class-conditional model only.
We compared the strength and direction of the relationship in both the class-independent and classconditional models by directly comparing the slopes derived from each model for each gene-property relationship (where slope indicates the change in the property per 2-fold change in gene expression; shown for AHP amplitude in Fig 2B). While there is broad agreement between the class-independent and class-conditional models (r Spearman = 0.52), a substantial number of gene-property relationships are significant in one model but not the other (FDR < 0.1). In other words, these relationships are either class-driven (significant in the class-independent model only) or non-class-driven and obscured by class (significant in the class-conditional model only). For example, the relationship between the gene Gprasp1 and AHP amplitude illustrates an example of a class-driven relationship where the apparent relationship is entirely due to broad differences in excitatory and inhibitory classes ( Fig 2C). The gene Camk2g shows a non-class-driven relationship with the same property that is obscured in the classindependent model by higher AHP amplitude values in inhibitory cell types ( Fig 2D). However, many genes, such as Xxylt1, are identified using either model ( Fig 2E).

Divergent gene-property relationships in inhibitory versus excitatory cell classes
We next wondered whether some gene-property relationships might be potentially different within, or specific to, excitatory or inhibitory cell types. To test this, we incorporated an interaction term between gene expression and excitatory versus inhibitory cell class to assess whether the gene-property relationships (i.e. slopes) were different within each cell class. For nearly all properties, there were fewer significant genes in the interaction model compared to the class-conditional model ( Fig 3A, Venn diagrams, and Table S3). For example, out of the ~6500 genes significantly associated with AHP amplitude in the class-conditional model, ~2000 also show interactions, and there are an additional ~700 which show an interaction but are not significant in the class-conditional model. This could indicate that "true" interactions are comparatively rare, but this finding is also likely partly explained by differences in statistical power. In addition, these interactions do not appear to be merely the result of low or no gene expression within one cell class but not the other; we did not observe strong correlations for any property between the interaction model slope and the average difference in expression levels between inhibitory and excitatory cell types ( Fig S2).
For all properties, we found that the slopes of the gene-property relationships within excitatory cell types were poorly correlated with those within inhibitory cell types (example features maximum branch order and AHP amplitude shown in Fig 3B, C). By definition, the genes with significant interaction terms were those where the slopes calculated within excitatory and inhibitory classes were very different from each other (pink and purple points in Fig 3B, C). If the majority of gene-property relationships are shared between excitatory and inhibitory cell types, as suggested by the greater number of significant genes in the class-conditional model than in the interaction model for most properties, one might expect a positive correlation between slopes calculated in inhibitory and excitatory cell types. However, such a correlation may be lacking in this analysis because we would expect most genes to have no relationship to a given property and thus most slopes to be near zero.
The properties maximum branch order and sag are unusual in that they show few significant genes using the class-conditional model, but many (1914 and 1174, respectively) in the interaction model ( Fig 3A, Venn diagrams, and Table S3; slopes for maximum branch order plotted in Fig 3B). We hypothesize that this might be because these properties are under stronger (or otherwise more readily identified) genetic control in excitatory compared to inhibitory cell types (see Discussion). ( Fig 3D). In other words, the interaction model identified a potentially interesting relationship in the case of Nrxn3 which was missed by the class-conditional model. For Man1c1, the interaction model does not reveal a new relationship, but instead highlights the fact that this gene-property relationship, if real, is potentially more complicated than would be assumed based on the class-conditional model alone.
Man1c1 is an enzyme involved in the maturation of N-linked oligosaccharides (18), and is thus a plausible regulator of AHP amplitude, since N-linked glycosylation of voltage-gated potassium channels or their auxiliary subunits is known to regulate both surface trafficking and channel function (19,20).
The apparent class-specificity of this relationship could result from class-specific co-expression of certain potassium channels or other enzymes involved in glycan synthesis or maturation.

E. Example of a gene which is significant in both the class-conditional and interaction models.
Results from the class-conditional model are more likely to validate using independent methods We next asked how the gene-property relationships from the class-independent and class-conditional models, based on our analysis of the AIBS cortical cell types dataset, might generalize to other datasets.
We first compared the results reported here to those from our earlier NeuroElectro/NeuroExpresso (NE) literature-based dataset (11), after subsetting these data to include only non-projecting cell types (reflecting 19 cell types in total sampled throughout the brain, described in detail in the Methods). We chose to use non-projecting cell types in the NE dataset, as these were recently described by a mouse brain-wide transcriptomic survey as corresponding to a single broad cell class (13). To this end, we calculated Spearman correlations between genes and electrophysiological properties in the NE dataset.
Next, for gene-property relationships from both the class-independent and class-conditional models, we assessed their aggregate consistency with those from the NE dataset. Here, we defined "consistency" for a given model (i.e. class-independent or class-conditional) and property as the correlation between geneproperty slopes calculated from the AIBS dataset with the Spearman correlations for the same set of gene-property relationships in the NE dataset (illustrated in Fig 4B). In Fig 4A we show a comparison of the gene/electrophysiology correlations from the AIBS dataset with the model slopes (beta) from the NE dataset (11). We found that for seven out of the eleven electrophysiological properties shared between the datasets, both AIBS dataset-based statistical models were consistent with analogous gene-property relationships based on the NE dataset (r Spearman as high as 0.305 and 0.35 for class-independent and class-conditional, respectively). For six out of the eleven features, we found that the class-conditional model was considerably more consistent than the classindependent model with relationships in the NE dataset. For only two features, capacitance and membrane time constant (tau), was the class-independent model more consistent than the classconditional with the NE dataset. Fig

C-D. Example of a gene showing consistent results between the NE dataset and the AIBS dataset using the class-conditional model, but not the class-independent model. C shows the relationship within the AIBS dataset, and D shows the same gene and property in the NE dataset. Solid lines indicate a linear fit including only types belonging to one cell class, and dashed line indicates a linear fit including all cell types.
Assessing within-cell type correlations using Patch-seq datasets We next wondered whether these between-cell type gene-property relationships might be predictive of cell-to-cell heterogeneity within a given cell type. We reasoned that the recently developed Patch-seq methodology, allowing morphological, electrophysiological, and transcriptomic characterization from the same single cell, presents a unique opportunity to test this possibility (17). While these data at present are limited by relatively modest sample sizes and technical factors such as inefficient mRNA capture and potential off-target cellular mRNA contamination (21), we nonetheless sought to use these data to assess the nature of within-cell type gene-property relationships.
To this end, we performed an integrated analysis of 5 Patch-seq datasets, with each dataset characterizing transcriptomic and electrophysiological diversity of mouse forebrain inhibitory cells from the neocortex, hippocampus, and striatum (Table 1). Our analysis includes one novel dataset of 19 Pvalb-Cre positive interneurons recorded in region CA1 of the mouse hippocampus, reported here for the first time. Cells in this dataset (referred to as the Bengtsson Gonzales dataset), were characterized as described in (22).
To jointly analyze these Patch-seq datasets, we first mapped Patch-seq sampled cells to the cell type level, using a transcriptome-based classifier that assigns cells to cell types as defined by cellular dissociation-based single-cell RNAseq reference atlases from the cortex and striatum (14,22). Lamp5, etc. (referred to in Tasic et al., 2018 as "subclasses"). Next, for each cell type, we identified genes that are highly variable in their expression levels within cells of the same type. We reasoned that these highly-variable genes might be those most likely to drive or appear correlated with electrophysiological heterogeneity within each cell type. Lastly, we performed a joint analysis across Patch-seq datasets to assess the strength of gene-property relationships within cell types where the gene was highly variable. Here, we used a mixed-effects regression model, with gene expression as a fixed effect and dataset and cell type as random effects and with cells weighted by their estimated transcriptome quality (see Methods).
Despite the limitations of the Patch-seq data, we found a small number of genes whose expression levels were significantly associated with cell-to-cell electrophysiological heterogeneity within cell types (FDR < 0.1; Fig 5A). For example, we found that expression of Kcna1, which encodes the potassium channel in a separate single-cell RNA-seq study of CA1 interneurons, Fxyd6 was found to be more highly expressed cells known to spike more slowly (25).
In general, we found that when a gene-property relationship was statistically significant in both the Patch-seq and AIBS class-conditional analyses (FDR < 0.1), this relationship was usually in the same direction in both analyses (Fig 5A; 10 out of 13 gene-property relationships total). Results were similar in the class-independent model, except with a smaller set of gene/ephys relationships matching between both (7 out of 9 relationships were in a consistent direction). All of the genes which were consistent between the class-independent and Patch-seq analyses were also consistent in the class-conditional model. While our analyses of these Patch-seq datasets should be considered preliminary (pending the availability of larger and higher-quality datasets), we find the correspondence with our earlier analysis encouraging. Namely, this analysis suggests that some of the same genes that appear to drive large differences across cortical cell types might also be defining more subtle within-cell type heterogeneity. The expected relationship between voltage-gated potassium channels and AHP amplitude is apparent only after accounting for cell class We next asked whether we see a relationship between an electrophysiological feature and a category of genes which are known regulators of that feature. Voltage-gated potassium channels are known to be involved in producing the after-hyperpolarization following an action potential (30,31) (AHP amplitude; illustrated by the dashed arrow in Fig 6A). We thus hypothesized that for many of these genes, higher expression levels would be associated with larger AHP amplitudes (although not all voltage-gated potassium channels necessarily contribute directly to AHP amplitude). We further hypothesized that this relationship would be more apparent after accounting for cell class, in part because AHP amplitudes differ considerably between excitatory and inhibitory cell classes (Fig 6B-D). Indeed, our previous work found a spurious negative correlation between expression of the Kcnb1 gene and AHP amplitude which resulted from higher expression of Kcnb1 in excitatory cell types compared to others (11).
We evaluated model slopes between each of 29 voltage-gated potassium channel genes (32) and AHP amplitude in the AIBS dataset for each of the class-independent and class-conditional statistical models (examples shown in Fig 6B-D and summary in Fig 6E).
Examples of voltage-gated potassium channel genes associated with AHP amplitude include Kcnh3 ( Fig   6B) in a class-driven and Kcnh7 and Kcnc2 in a non-class-driven manner (Fig 6C, D). In total, the classindependent model identified 17 significant genes (at a stringent threshold of FDR < 0.01), with 8 of these genes having positive slopes and 9 negative. In contrast, there were 12 genes that were significantly associated with AHP amplitude in the class-conditional model at the same statistical threshold, and 11 of these genes had slopes in the positive direction. Thus the results obtained using the class-conditional model are consistent with our a priori hypothesis that expression levels of voltagegated potassium channel genes are more likely to show positive than negative relationships with AHP amplitude, whereas the results obtained using the class-independent approach do not appear to support this conclusion. Expression of Scn1b, a voltage-gated sodium channel subunit, shows a negative relationship with action potential half-width in the class-conditional model (FDR = 0.0008; Fig 7B), as well as a number of other properties. This relationship is obscured in the class-independent model due to overall longer halfwidths in excitatory cell types. Consistent with the idea that Scn1b might function to shorten AP halfwidths, layer 5 cortical pyramidal neurons from mice lacking the Scn1b gene show longer half-widths than controls, due to changes in protein stability of voltage-gated potassium channels (33).
Interestingly, the Lrrk2 gene, mutations in which contribute to Parkinson's disease (34), is positively correlated with neurite branchiness (number of branch points per μm) in the class-conditional model, but not the class-independent model (FDR = 0.046; Fig 7C). Lrrk2 has been shown by several studies to regulate neurite outgrowth and branching in cultures (35)(36)(37)(38).
Not only do the genes discussed here provide important validation for our method, but the existence of a smooth correlation between these genes and their associated properties is potentially interesting. The

Novel gene-property relationships
In addition to those discussed above, we identified many genes whose function in regulating neuronal electrophysiology or morphology is less well characterized. These present testable hypotheses for future study. In Table 2, we list some of the top significant genes from the class-conditional model for each property, chosen based on significance levels and/or previous studies into their cellular function (also shown in Fig 7D).
One notable feature from this analysis is that many of these genes, like Kcna1 and Scn1b discussed above, are significantly associated with several or many different properties. For example, maximum firing frequency, input-output curve slope, and average interspike interval show a similar pattern in the strength of their association with this set of genes. These features all measure similar aspects of neuronal function (broadly speaking, whether a neuron tends to fire rapidly or not), so it would be surprising if they did not show correlations with the same genes. Two more properties that closely share associated genes are AP half-width and AHP amplitude, which measure distinct aspects of the action potential waveform, but might share genetic underpinnings related to rapid channel opening and closing (40). The genes most strongly associated with various electrophysiological properties tend not to show significant associations with the morphological properties of branchiness and max branch order. However, some of the genes associated with these morphological properties do show some (generally weak) associations with some electrophysiological properties (for example Mgat5 and Ifitm10). Several of the genes for which we were unable to find conclusive loss-of-function studies in the current literature (Fig 7E-H

Discussion
In this work we presented a series of correlations between gene expression and electrophysiological or morphological properties, each representing a testable hypothesis for future studies. Our key insight here is to introduce cell class (i.e., excitatory and inhibitory cell type identity) as an indicator variable when modeling the relationship between genes and properties. This has the advantage of 1) avoiding the identification of class-driven correlations, 2) helping identify a subset of non-class-driven correlations that might have been obscured by overall differences between excitatory and inhibitory cell types, and 3) revealing instances where gene-property relationships might be different for excitatory versus inhibitory cell types.
Although the idea that non-class-driven correlations would have a higher chance of being biologically relevant compared to class-driven ones seems straightforward, we evaluated this prediction through a In this work, we have operationalized the concepts of class-driven and non-class-driven correlations as those which produce a significant result in the class-independent model only or in the class-conditional model, respectively. This is a simplification, since both effects can exist simultaneously to differing degrees (for example, Daam1 and AP half-width, Fig 7H)  to the class-conditional model. One possible explanation is that for both of these features, the absolute slopes in excitatory cells tend to be higher than those in inhibitory cells (shown in Fig 3B for maximum branch order), suggesting either that these features might be under stronger genetic control in excitatory types compared to inhibitory, or that the genes associated with them in excitatory cell types are more readily identified by our analysis. Since this dataset contains more inhibitory than excitatory types, an inhibitory-specific relationship may be identified in the class-conditional model by virtue of the number of cell types, but an excitatory-specific relationship would likely be "diluted" by the larger number of inhibitory cell types not showing the relationship. It is also possible that, in the case of maximum branch order, this effect is partially explained by methodological differences in the dataset, since inhibitory but not excitatory morphological reconstructions contain axons in addition to dendrites (1).

Novel putative gene/electrophysiology relationships
Our primary motivation for comparing gene expression to neuronal properties is to identify candidate genes that might influence those properties. While directly testing the functional relevance of specific gene-property predictions is beyond the scope of this work, we have highlighted below some of our potentially novel findings that might be of greatest interest for further follow up.
Rab33a expression is positively correlated in the AIBS dataset with AHP amplitude with a significant interaction (Fig 7C), and also shows significant positive correlations with input-output curve slope, maximum firing frequency, and rheobase, and significant negative correlations with AP half-width and average interstimulus interval (ISI). Rab33a is a small GTPase thought to be involved in regulation of vesicle trafficking, likely at stages prior to plasma membrane docking (41,42). One hypothesis for how Rab33a could regulate AHP amplitude and/or AP half-width is that Rab33a might facilitate the transport and/or insertion of vesicles containing voltage-gated potassium channels, or regulators thereof, into the axonal membrane, leading to narrower action potentials and larger AHPs. Our analysis of the AIBS data suggests that any effects of Rab33a expression on AHP amplitude would be present only in inhibitory cell types.
Med23 (also known as Crsp3), a subunit of the mediator complex which acts as a transcriptional coactivator for RNA polymerase II (43,44), shows a positive correlation with AHP amplitude (Fig 7D).
Although the complete set of roles played by Med23 are incompletely understood, it has been shown to modulate signaling by the BMP, Ras/ELK1, and RhoA/MAL pathways (45,46). Thus it has the potential to regulate a variety of genes, including potentially voltage-gated potassium channels or interacting expect that this proportion may be higher than that in our previous work (11), However, causality can only be determined for a given gene and property using direct experimental methods.
Additionally, as in our prior work (11), we have limited our analyses to models in which expression levels of a single gene predict downstream properties in an approximately linear fashion, and in which that gene is regulated primarily at the transcriptional level. Some instances of mechanisms involving interactions between multiple genes, or those involving a non-linear relationship between log-gene expression and an electrophysiological or morphological property, are likely to have been missed here.
In addition, for mechanisms through which electrophysiological or morphological properties are controlled at the translational or post-translational level, our analysis is unlikely to provide insight into the gene whose product directly controls the property. However, this analysis has the power to identify transcripts whose products are involved in the translation, modification, or trafficking of proteins which in turn regulate electrophysiology or morphology.
Furthermore, the generalizability of the gene-property relationships reported here might be limited by the fact that the AIBS dataset only reflects cells sampled from the adult mouse primary visual cortex.
Therefore, the relevance of our results to other brain regions depends on the assumption that many of the same genes regulate electrophysiological or morphological properties in different cell types. This assumption of generalizability across brain areas appears to be appropriate in the case of Kcna1 and maximum firing frequency (Fig 7A and (7)). Additionally, this assumption is supported by our comparisons with the NeuroExpresso/NeuroElectro dataset and Patch-seq datasets, both of which contain cells sampled from other brain regions. However, some relationships may not generalize across brain regions due to differences in expression of other genes or the presence of post-translational modifications which modify the consequences of expressing a given gene. Another potential confounding factor in our reliance on the AIBS datasets is the uneven balance in the count of inhibitory versus excitatory cell types. The practical consequence of this is that the results from the class-conditional model are likely biased towards explaining gene-property relationships within inhibitory cell types, and might be missing relationships that are specific to excitatory cell types. Even in the absence of a significant interaction term, gene-property relationships may have stronger evidence in one cell class than the other. An example of this is Lrrk2 and branchiness (Fig 7C), where despite very similar slopes between classes and no statistical evidence of an interaction, the correlation among excitatory cells is much tighter than that among inhibitory cells. For this reason, when prioritizing genes for future study, we strongly recommend making a plot of gene, property, and cell class before concluding that the overall result is likely to apply to both classes.

Future Directions
The primary goal of this project was to produce a list of genes which we can recommend for future study based on their correlations with electrophysiological and morphological properties in the AIBS dataset. We believe that some of the genes we identified are promising candidates for future study.
In order to facilitate the use of our results by others in prioritizing genes for investigation, we are providing a Jupyter Notebook file to facilitate exploration of the data (available at https://github.com/PavlidisLab/transcriptomic_correlates). We have endeavored to make this easy to use for researchers with little or no coding experience. We encourage those who are interested in a particular electrophysiological or morphological property, gene, or set of genes, to explore the data and to make their own judgements as to which genes are worth following through on experimentally and which measures should be prioritized for recording. Our recommendation is to use the gene list in conjunction

Electrophysiological and morphological measures
Electrophysiological data were downloaded from http://celltypes.brain-map.org/ and summarized as described previously (11) except for the features response frequency versus stimulus intensity (inputoutput) curve slope, average interstimulus interval (ISI), and sag, which we did not use previously as they were not represented in the NE dataset. All three of these new features were pre-computed in the downloaded dataset. In order to include only sag values which could be meaningfully compared, any cells having a value of "vm-for-sag" (the membrane voltage at which sag values were measured) not between -90 and -110 mV, or having a resting membrane potential lower than -80 mV, were excluded from analyses of sag, but were used for analyses of other electrophysiological features. The morphological features "average_bifurcation_angle_local", "max_branch_order", "soma_surface", "total_length", and "total_volume" were pre-computed in the dataset. We defined "branchiness" according to the pre-computed feature "number_branches" divided by "total_length" as a measure of how often a given cell produces branches per unit of neurite length. For the features input resistance, tau, capacitance, rheobase, maximum firing frequency, AHP amplitude, adaptation ratio, input-output curve slope, latency, branchiness, max branch order, total length, and total volume, values were log10transformed prior to use in order to mitigate underlying skew or non-normality in these data values.

Defining cell types
Cell types in the AIBS dataset were defined according to the Cre-line they were isolated from, whether they were excitatory or inhibitory, and in most cases either a single cortical layer or a range of layers.
Where multiple layer dissections containing a sufficient number of cells were present for a Cre-line in  Fig   S1).
After the two datasets were matched, the combined dataset contained 1359 cells belonging to 48 types with electrophysiological data, 369 cells belonging to 43 types with morphological data, and 4403 cells belonging to 50 types with RNA-seq data (Table S1). The remaining cells in the original datasets were those whose types could not be matched, either because the Cre-line or layer they were isolated from was not sampled in the other datasets, or because the number of cells belonging to that type was below our threshold for the number of cells per type required. Modeling the relationship between gene expression and electrophysiology/morphology Mean expression values for each gene and mean values for each electrophysiological or morphological property were calculated for each cell type as defined above. If more than two cell types showed zero expression of any given gene, those cell types were removed from analyses for that gene. We found this step to be necessary in initial analyses because differences in electrophysiology/morphology among these cell types could not be assessed in relation to differences in gene expression, potentially producing Non-projecting class-specific correlations in the NeuroElectro/NeuroExpresso dataset The NeuroElectro and NeuroExpresso datasets were described previously (11). In order to limit the dataset to only non-projecting cell types (13), we chose cells whose major type was annotated as anything other than "Pyramidal," "Glutamatergic," or "MSN". Cells of the types "Ctx Htr3a" and "Ctx Oxtr" were excluded due to their lower transcriptomic quality compared to others in the dataset (54).
After subsetting, 19 cell types remained. Average values were calculated for gene expression and electrophysiological properties across cells within a type, and Spearman correlations were calculated for each combination of gene and electrophysiological property.
In order to assess cross-dataset consistency, we calculated a Spearman correlation between the beta coefficients (slopes) resulting from the class-independent or class-conditional model in the AIBS dataset and the correlation values calculated in the NE dataset. If there was a significant positive correlation between the AIBS slope and the NE correlation value, we concluded that the results of the two analyses were consistent (although this does not imply that they were highly consistent). For those comparisons which were consistent, we considered one method to be "more consistent" than the other if the AIBS/NE correlation value was higher (with non-overlapping 95% confidence intervals) than that derived using the second method.
Bootstrapped confidence intervals and significance between models for correlations between the NE and AIBS datasets were calculated as follows: Starting with the list of paired correlation values and beta coefficients for a given electrophysiological feature and model (class-independent or class-conditional), in which each pair represented a single gene and each value in that pair was calculated using one of the two datasets, a new list of paired correlation values of the same length was calculated by resampling with replacement. A new Spearman correlation was then calculated based on the resampled list. The resampling procedure was repeated 100 times, and the upper and lower ends of the confidence intervals were calculated by finding the values at the 2.5th and 97.5th percentiles. Significance was determined by finding the difference between each pair of resampled correlations from the two models, and then again finding the values at the 2.5th and 97.5th percentiles. If this interval did not contain zero, the two consistency metrics were said to be significant at p < 0.05.
Hierarchical clustering in Fig 7D was performed using the seaborn.clustermap tool using the "average" (UPGMA) method and the euclidean metric (56,57).

Data Availability
Analysis code and processed data will be available at https://github.com/PavlidisLab/transcriptomic_correlates. Included there is a Jupyter notebook file with some recommended steps for filtering and visualizing results, which can be run directly from the user's web browser without any need for installation of software. We have made an effort to make this resource approachable for researchers with little or no coding experience. The Bengtsson Gonzales Patch-seq dataset will be made publicly available.
inhibitory cell types in the Tasic dataset (described in the section below), we calculated the Spearman correlation of each Patch-seq cell to every cluster in the dissociated cell dataset and assigned cells to the cluster that they were most correlated with (we compared all Patch-seq datasets except the striatum Muñoz dataset to the Tasic cortical dataset). For cortical and hippocampal cell types, to increase the number of cells defined per transcriptomic type, we made use of the 'subclass' mappings provided in the Tasic 2018 dataset, mapping neurons to the Pvalb, Sst, Vip, Lamp5, and Sncg major interneuron cell types.
To estimate transcriptome quality we used the "quality score" metric from our prior analysis, using the full set of "on" and "off" marker genes.
Identifying highly variable genes per cell type. We used the 'decomposeVar' function from the 'scran' R package (59) to identify highly variable genes in each subclass in the Tasic  where we used an anova to test for the significance of the beta associated with the gene expression term by comparison to an equivalent statistical model without the gene expression term. We used the quality score as a weight in the regression analysis, and normalized these across datasets. We used the 'lmer' function within the 'lme4' R package for fitting mixed-effects models. We performed this analysis on the top 250-most variable genes per cell type and for genes that were highly variable in at least one cell type across at least 2 (of the 5 total) Patch-seq datasets used here. In addition, we did not use Patch-seq cell types where gene expression was detected in fewer than 33% of cells and with fewer than 5 cells expressing the gene.
dataset) under the supervision of JH-L. CB and SJT wrote the original draft of the manuscript, and all authors contributed to review and editing.
The following are in separate files: