Categorical encoding of decision variables in orbitofrontal cortex

A fundamental and recurrent question in systems neuroscience is that of assessing what variables are encoded by a given population of neurons. Such assessments are often challenging because neurons in one brain area may encode multiple variables, and because neuronal representations might be categorical or non-categorical. These issues are particularly pertinent to the representation of decision variables in the orbitofrontal cortex (OFC)–an area implicated in economic choices. Here we present a new algorithm to assess whether a neuronal representation is categorical or non-categorical, and to identify the encoded variables if the representation is indeed categorical. The algorithm is based on two clustering procedures, one variable-independent and the other variable-based. The two partitions are then compared through adjusted mutual information. The present algorithm overcomes limitations of previous approaches and is widely applicable. We tested the algorithm on synthetic data and then used it to examine neuronal data recorded in the primate OFC during economic decisions. Confirming previous assessments, we found the neuronal representation in OFC to be categorical in nature. We also found that neurons in this area encode the value of individual offers, the binary choice outcome and the chosen value. In other words, during economic choice, neurons in the primate OFC encode decision variables in a categorical way.


Author summary
Mental functions such as sensory perception or decision making ultimately rely on the activity of neuronal populations in different brain regions. Much research in neuroscience is devoted to understanding how different groups of neurons support specific brain functions by representing behaviorally relevant variables. In this respect, one important question is whether neuronal populations represent discrete sets of variables (categorical encoding) or random combinations of variables (non-categorical encoding). Here we developed a new algorithm to assess this general issue. We then used the algorithm to examine neurons in the orbitofrontal cortex (OFC) recorded while non-human primates performed economic decisions. We found that the neuronal representation was categorical. Specifically, neurons in the OFC encoded the value of individual offers, the binary Introduction A recurrent question in systems neuroscience is that of understanding what variables are encoded by a given population of neurons. Addressing this issue is a prerequisite to understand what role neurons play in functions such as sensation or decision making. In a typical experiment, animal subjects perform some task, behavioral conditions vary along one or more dimensions, and the corresponding parameter(s) define variables potentially encoded by neurons in some brain area. In first approximation, if firing rates vary systematically with a variable, it can be said that neurons encode or represent that variable. Building on this concept, countless studies shed light on the neural substrates of sensory, associative and motor processes. Importantly, identifying the variables encoded by a given population can sometimes be challenging due to the trial-by-trial variability of neuronal firing rates combined with three other factors. First, different neurons, even in close proximity to one another, may encode different variables, and the number of variables encoded by a neuronal population is generally not known. This situation may arise in any brain area but is most typical for prefrontal regions. Second, different candidate variables potentially encoded by the neuronal population may be substantially correlated with one another. Third, the encoding of different variables may be categorical or non-categorical. In a categorical representation, neurons in a population encode a discrete set of variables. Conversely, neurons in a category-free representation encode a continuum of variables [1][2][3]. Of course, the encoding scheme adopted by any particular population is not known a priori. All these issues are particularly pertinent to the representation of decision variables in the orbitofrontal cortex (OFC)-an area implicated in economic (or value-based) decisions [4,5]. In recent years, numerous studies have shown compelling evidence for mixed selectivity and category-free encoding in lateral prefrontal regions [6][7][8][9][10], suggesting that these traits are the hallmark of neural systems supporting complex cognitive functions [3,11]. At the same time, several studies argued for categorical encoding of decision variables in the OFC. Concurrent results in this sense came from studies of economic decisions in non-human primates [12,13] and from studies of decision confidence in rodents [14]. In contrast with these observations, a recent study argued for non-categorical encoding of decision variables in the primate OFC [15] (more on this below). Importantly, the categorical nature of this representation is a key assumption underlying current neuro-computational models of economic decisions [16][17][18][19][20][21][22][23]. Given the importance of this matter, we set up to revisit the question of categorical versus category-free encoding in the OFC using a new and more powerful statistical approach. Our goal was to develop a set of procedures (or algorithm) with four objectives in mind. First, the algorithm should assess the categorical versus category-free nature of a neuronal representation without committing to any particular set of variables. Second, if the encoding was indeed categorical, the algorithm should facilitate a quantitative comparison of multiple candidate variables potentially represented by the neuronal population. Third, the algorithm should operate seamlessly in cases where different variables encoded in the neuronal population are correlated. Fourth, the algorithm should be amenable to general use, for any neuronal population and any behavioral task.
To achieve our stated objectives, we considered the high-dimensional space defined by all the behavioral conditions occurring in the task (referred to as "trial types"). We noted that each neuronal response corresponds to one point in this space. Furthermore, after normalization, each response corresponds to one point on the hyper-spherical surface of unitary radius. Cast in this terms, the problem of assessing whether a neuronal representation is categorical in nature maps onto a clustering problem defined on a high-dimensional hyper-spherical surface, which we resolve using a spherical k-means approach [24]. In our algorithm, the categorical or non-categorical nature of the representation is assessed before defining any behavioral variable. The spherical k-means returns a number of clusters and their locations in the space of possible responses (i.e., the hyper-spherical surface). Furthermore, any variable possibly encoded in the neuronal population (i.e., any quantity systematically varied across behavioral conditions) also corresponds to a point on the hyper-spherical surface. Casting a wide net, we can generate a large number of variables potentially encoded by the neuronal population and thus identify the subset of variables that minimizes the total distance from the clusters. Importantly, these procedures are completely general and do not depend on the specifics of the behavioral task, except for the definition of candidate variables potentially encoded by the neuronal population.
The Results are organized as follows. The first section describes the juice choice experiments conducted in monkeys, the neuronal data set collected in OFC, and previous analyses of these data. The second section introduces the new algorithm. The third section demonstrates how the criteria previously used to assess the categorical nature of the neuronal representation in OFC [12] can, in some cases, lead to erroneous conclusions. The fourth section describes the results obtained by testing the new algorithm on a set of synthetic data. In the following section, we describe the results obtained by analyzing the actual OFC data with the new procedures [25]. In a nutshell, the results corroborate previous findings [13]. In the Discussion, we compare the present algorithm to other approaches proposed in the literature. We also emphasize that procedures presented here provide a general and powerful method to analyze heterogeneous populations of neurons.

Data set and previous analysis
In the experiments, two rhesus monkeys performed an economic choice task [13,25]. In each session, the animal chose between two juices offered in variable amounts. The preferred and non-preferred juices were labeled juice A and juice B, respectively. A "trial type" was defined by two offers and a choice (e.g., [1A:3B, B]). The number of trial types varied from session to session (because we varied offer types and because of variability in choices), and each session typically included 5-20 trials per trial type. Our data set included 1008 neurons. Neuronal spiking activity was recorded and processed with standard techniques (see Methods). For the analysis of how firing rates depended on the task variables, we defined several time windows aligned with respect to different behavioral events. For each trial type and each time window, firing rates were averaged across trials. A "neuronal response" was defined as the activity of one cell in one time window as a function of the trial type.
Our previous analyses proceeded as follows [13,25]. First, each neuronal response was tested with an ANOVA (factor trial type). Responses that passed a statistical criterion (p<0.001) were considered task-related and analyzed further. Our data set included 2047 taskrelated responses. Second, we defined a large number of variables potentially encoded by this population. We performed a linear regression of each response on each variable, from which we obtained the regression slope and the R 2 . If the regression slope was significantly different from zero (p<0.05), the variable was said to "explain" the response. Third, two proceduresstepwise and best subset-were used to identify a small set of variables that best explained the neuronal population. In a first study [13], both procedures identified variables offer value A, offer value B, chosen value and chosen juice. This result was replicated several times, including in the data set examined here [25] (Fig 1). Finally, each neuronal response was assigned to the selected variable that provided the highest R 2 . Two additional analyses were conducted to address the issue of categorical versus non-categorical encoding. First, for each neuronal response it was assessed whether adding a second variable to the regression (through a bi-linear regression) would significantly improve the fit. This analysis found that this was the case for only a small fraction of responses [13]. A second analysis quantified for each neuronal response and for each pair of selected variables the difference in the corresponding R 2 (ΔR 2 ), and examined the distributions of ΔR 2 across the neuronal population. In general, these distributions presented a significant dip close to zero, indicating that variables were encoded in a categorical way [12].
The approach for data analysis summarized above has the advantage that it allows to examine a large number of variables in parallel without biasing the conclusions, and that it withstands situations in which candidate variables are highly correlated with one another [13]. At the same time, this approach presents two limitations. First, the analyses require to first define candidate variables, then identify the most explanatory ones, and finally assess whether the encoding is categorical. In contrast, it would be preferable to assess whether the encoding is categorical without committing to any particular variable or set of variables, and only later define variables that best capture each category of responses. Second, there are situations in which the argument for categorical encoding based on the distribution of ΔR 2 is not valid (more on this below). The algorithm presented in this study addresses these limitations.

Detection of categorical encoding using spherical clustering
The algorithm used to assess categorical encoding was applied to task-related responses (i.e., responses that passed the ANOVA criterion; see above). To detect categorical encoding, we devised an algorithm that combines a clustering procedure partitioning neural responses based only on their spatial configuration with one that starts from a particular set of variables. In essence, the idea is to select a set of variables that best represents the spatial configuration of neural responses in the high-dimensional space of trial types. Fig 2 illustrates the algorithm for a 3-dimensional space (i.e., 3 trial types). Each data point represents a neuronal response (i.e., the activity of one cell, in one time window, averaged across trials for each trial type). Neuronal responses are first centered and normalized. This transformation places neuronal responses on a spherical surface of unitary radius. This data set undergoes two separate procedures for spherical clustering. First, data are examined with spherical k-means procedure, which does not assume any particular variable and yields a partition of the neural activity points based solely on the configuration of points in the high-dimensional space of trial types. For any number of clusters, this procedure alone reveals the categorical or non-categorical nature of the neuronal representation. Second, we perform a variable-centroid clustering, which starts from a particular subset of variables (iteratively chosen from a large set of candidate variables; see Table 1). Notably, each variable corresponds to a point on the spherical surface. Thus the subset of variables defines a corresponding number of cluster centroids, and we assign each neural response to the closest centroid. Each of these two clustering procedures (spherical k-means and variable centroid clustering) returns a partition of the population of neural responses. Importantly, the number of clusters is not known a priori. Furthermore, for any such number, there are many possible subsets of variables. We thus want to identify the subset of variables that best describes the neuronal data. As a measure of similarity between the two partitions, we use the adjusted mutual information [26]. Thus, we repeat the spherical k-means and the variable centroid clustering procedures for various number clusters and subset of variables. The variables that best match the non-committed spherical k-means partition are identified as encoded by the neuronal population.

Limits of previous approaches
In previous work, the categorical nature of the neuronal representation in OFC was assessed through the analysis of the distribution of ΔR 2 [12]. As explained above in Section "Data set and previous analysis", ΔR 2 quantifies the difference between the R 2 values obtained from the linear regressions onto a pair of selected variables. Intuitively, clusters of neural responses around variables should be discernible as peaks in the distribution of ΔR 2 values across the neuronal population. A dip in this distribution was thus interpreted as evidence for categorical encoding whereas a distributions without a dip was interpreted as evidence against categorical encoding [12]. We will now demonstrate that this criterion can sometimes lead to erroneous conclusions. To do so, we construct two synthetic data sets and we show that the ΔR 2 analysis fails while the spherical k-means algorithm reveals the true nature of the data. Again, each In the experiments, monkeys chose between different juices offered in variable amounts. The two juices were labeled A and B, with A preferred. Offers were presented as visual stimuli on a computer monitor. Different juice types were associated with different colors, and the number of squares represented the juice quantity. After a randomly variable delay, the animal indicated its choice with an eye movement. (B) Example offer value A response. In this panel, the x-axis represents different offer types ranked by the ratio q B /q A , where q J is the quantity of juice J offered. For each offer type, a black dot indicates the percent of trials in which the animal chose juice B (y-axis on the right). The relative value of the two juices (ρ) was obtained from a logistic fit. For this session, we measured ρ = 1.9. Red symbols illustrate one neuronal response. Diamonds and circle refer to trials in which the animal chose juice A and juice B, respectively. Vertical error bars indicate SEM. The activity of this cell increased almost linearly with the quantity of juice A offered, and did not depend on the quantity of juice B offered.  neural response is a point on the hyper-spherical surface of a high-dimensional space defined by the trial types, and variables correspond to points on this surface.
To simulate different neuronal populations, we generated distributions of points on the hyper-spherical surface. Fig 3A illustrates the first example. Here data points form two clusters: a circular cluster close to the spherical pole and a banana-like cluster along the equator. Importantly, the distribution used to generate the banana cluster was uniform on a banana domain (no intrinsic dip). We now examine the situation in which the analyst identified the wrong variables, shown as large circles in Fig 3A. We assume that the analyst correctly identified the pole variable, but erroneously selected two variables located at the opposite tips of the banana cluster. As illustrated in Fig 3B, the distribution of ΔR 2 between the two banana variables has a significant dip around zero (Hartigan's dip test, p<0.001) suggesting that the two variables are categorically distinct. However, this suggestion is at odds with the ground truth. The dip in the distribution of ΔR 2 is due to the presence of the third cluster, because some of the data points in the banana are closer to the pole variable than to either of the banana variables. Hence, a dip in the distribution of ΔR 2 does not necessarily imply that the corresponding variables are encoded by categorically distinct groups of neurons. Importantly, the spherical k-means clustering correctly identifies the presence of two clusters (colors illustrate the k-means partitioning).
The second example makes the converse point, namely that a unimodal or uniform distribution of ΔR 2 does not necessarily imply a non-categorical representation. One obvious reason why this is the case is that absence of evidence is not evidence of absence; here we illustrate a subtler issue. We consider the same clusters defined above. In this case, we assume that the analyst correctly identified two variables, one in the pole cluster and one in the banana cluster. However, we assume that the variable in the banana cluster is off center (Fig 3C). As illustrated in Fig 3D, the resulting ΔR 2 histogram does not present a dip (Hartigan's dip test, p = 0.313), even though the two clusters are categorically separated. Importantly, the spherical k-means clustering correctly identifies the two clusters.
In conclusion, a dip in the distribution of ΔR 2 is neither sufficient nor necessary to assess the categorical nature of a neuronal representation. In general, such assessment requires the examination of the spatial distribution of data points in a high-dimensional space, using an approach such as the spherical k-means clustering.

Analysis of synthetic data
We considered several clustering procedures, and wanted to validate our algorithm to assess the categorical versus non-categorical nature of a neuronal representation on data for which  In this case, we assume that the analyst correctly concluded that there are two variables, but might have defined these variables such that one is on the north pole (gray) and the other is on the east end of the banana cluster (black). Inspection of the ΔR 2 histogram does not reveal any dip. The reason is that data points on the west end of the banana cluster are equally far from the two variables. https://doi.org/10.1371/journal.pcbi.1006667.g003 we could control the ground truth. Thus, we generated synthetic populations of neural responses with and without specific categorical structure, and applied clustering algorithms to these synthetic data.
For the real data, the experiments included 9 or 10 trial types, resulting in 9-or 10-dimensional neuronal responses, represented as points on the unitary hyper-sphere in 9 or 10 dimensions. (see section Data set and previous analysis). To generate synthetic neuronal responses with categorical nature, we randomly generated 9-dimensional points on the hyper-spherical surface clustered in the vicinity of selected variables (see Methods). We then analyzed these synthetic data sets with a wide range of clustering algorithms, including centroid-based clustering methods (mini-batch k-means, spherical k-means), hierarchical clustering methods (Ward, agglomerative clustering, Birch), and a graph-based clustering method (spectral clustering) [27][28][29][30][31][32]. To estimate the performance of these algorithms, we used silhouette plots, which are a common method to assess the goodness of clustering partitions [33]. For each data point X (here X is a normalized neuronal response), the silhouette value quantifies the mean distance between X and other data points in the same cluster, and compares it to the mean distance between X and data points in the nearest other cluster. The greater the silhouette value, the better the clustering. A negative silhouette value indicates that X was assigned to the wrong cluster, since X is closer to the nearest other cluster. Fig 4 shows the silhouette plots obtained for the various clustering algorithms. We found that the hierarchical clustering methods (Ward, Agglomerative, Birch) produced the greatest number of negative silhouette values. Spectral clustering produced slightly less negative silhouette values than Ward as the best hierarchical clustering method. The centroid-based methods had no (spherical k-means) or very few (mini-batch k-means) negative silhouette values and many large silhouette values, suggesting that these methods found the most consistent clustering partitions. The silhouette analysis further suggests that the spherical k-means clustering is best suited for categorical data lying on a hyper-sphere. We also compared the silhouette plots on real data recorded from OFC. We found that spherical k-means had the smallest number of negative silhouette values, confirming the results from synthetic data. Of note, the superior performance of spherical k-means might be due to the fact that this algorithm explicitly considers the hyper-spherical structure of the data. Hence, we used spherical k-means for clustering in the remainder of this study.
We next compared the spherical k-means silhouette plots for categorical synthetic data with those for non-categorical data (Fig 5). To simulate neural responses without specific categorical structure, we generated points uniformly on the hyper-spherical surface. We then varied the number of clusters between 2 and 7. We did not expect to find negative silhouette values for these data, because negative values indicate data point assignments to wrong clusters. Such mis-assignments cannot occur without any cluster structure in the data. Indeed, we did not find any negative silhouette values, neither for categorical data (Fig 5A-5F) nor for non-categorical data (Fig 5G-5L). However, while for categorical data the silhouette values in each cluster were dominated by large values yielding convex plots, the silhouette values for non-categorical data were dominated by small positive values yielding concave plots. Such concavity clearly indicates lack of cluster structure and allow to discriminate between categorical data and non-categorical data [33].
While the silhouette analysis provides a simple way to evaluate the assignments of data points to clusters, it does not immediately associate particular variables with clusters. To establish this relation, we devised a comparative clustering method. In addition to spherical kmeans, we performed a centroid-based clustering where the centroids were defined by a particular set of variables. We refer to this procedure as "variable-centroid clustering". We assigned each data point to the nearest centroid on the sphere (see Methods). We then compared the clusters obtained from spherical k-means to the clusters obtained from variable-centroid clustering, and quantified their similarity for different sets of variables.
Quantifying the similarity of two clustering partitions is non-trivial because similarity should be invariant for cluster relabeling. Many measures of similarity have been proposed [26,34,35]. Here we tested three measures of similarity based on mutual information, which are founded on information theory and naturally satisfy our desiderata. Specifically, we tested mutual information (MI), normalized mutual information (NMI) and adjusted mutual information (AMI). MI quantifies the information one clustering partition provides about another clustering partition; NMI normalizes MI yielding values between 0 and 1; AMI additionally Categorical encoding of decision variables in orbitofrontal cortex corrects for the agreement expected by chance. We compared the performance of these candidate measures of similarity using our synthetic categorical and non-categorical data sets. Fig 6 shows the results obtained for each measure as a function of the number of clusters specified in the spherical k-means algorithm and the number of variables defined in the variable-centroid clustering. For each number n = 1, 2, . . . of variables, we tested all of the possible sets of n variables, and we identified the set providing the maximum similarity. We used exhaustive search for this purpose (see Methods).
For both categorical and non-categorical data, MI tended to increase with the number of clusters and variables (Fig 6A and 6D). This was expected since additional clusters and variables can convey more information about each other. Importantly, MI increased to~0.5 bits even for non-categorical data, highlighting the necessity for normalization. The additional normalization in NMI yielded clear peaks for categorical data and mostly flat values for noncategorical data. This made it easy to discriminate between categorical and non-categorical data based on NMI. Additionally, the peaks indicated corresponding numbers of clusters and variables where n variables correspond to 2n clusters. This was because the reflection of data points on the hyper-sphere (see Methods) produced twice the number of clusters. This reflection also facilitated the separation of the data points into two clusters for both one and two variables. For this reason, the very strong peaks for two clusters should be ignored. The results obtained for AMI (Fig 6C and 6F) were very similar to those for NMI. The peaks for corresponding cluster numbers and variable numbers were slightly sharper for AMI. For this reason, we selected AMI as our similarity measure for the analysis of real neural data recorded from OFC.
In conclusion, the analysis of synthetic data with known ground truth showed that a combination of spherical k-means clustering and variable-centroid clustering compared with AMI provided the most powerful approach to assess the categorical nature of neuronal representations and to identify the encoded variables.

Analysis of neuronal data
We analyzed neuronal activity recorded from OFC during experiments in which monkeys chose between different juice types (see section Data set and previous analysis). In total, we analyzed 9 neuronal pools, each including 139-536 neuronal responses (see Methods), where the ranges of relative juice offer values were similar within each pool (S1 Fig). Applying to each pool the same comparative clustering procedure with spherical k-means and AMI used for synthetic data, we obtained silhouette plots and a similarity profile for the neuronal data.
We varied the number of clusters between 2 and 8 and found clusters with convex silhouette plots indicating categorical data (shown for the post-juice time window in Fig 7A-7G). Moreover, the almost complete absence of negative silhouette values indicated that the spherical k-means found consistent partitions for different number of clusters. The normalized neuronal data contains 9-10 dimensions (corresponding to trial types) which are hard to visualize. In Fig 8 we illustrate the 2-dimensional projections of a data set consisting of 9-dimensional responses for the post-juice time window. Four clusters are color-coded. Even though the clusters in this representation are partly overlapping, there is a clearly discernible structure. For a qualitative assessment of the results, we examined the response prototypes defined by the centers of individual clusters. In general, the response prototypes obtained for n = 3, 4, 5 closely resembled the neuronal responses illustrated in previous studies [13,25]. One example is illustrated in Fig 9. In other words, the clusters obtained from the spherical k-means qualitatively validated previous conclusions.
For a quantitative assessment, we used AMI. Comparing the k-means clusters and the variable-centroid clusters, we found similarity peaks for particular combinations of cluster and variable numbers (Fig 7H). These peaks resembled those obtained for synthetic data, providing further evidence for the categorical structure of the neural data. To analyze in more detail the clusters and variables yielding maximum AMI we performed a Jackknife analysis (see Methods). This procedure allowed us to estimate the variation of AMI values for a given number of clusters (Fig 7I). Excluding the peaks for 2 clusters, we obtained the highest AMI values for 4 clusters and 2 variables. The AMI for this combination of variables and clusters was significantly greater than the second largest AMI (Wilcoxon rank sum test, p<0.001). We show the corresponding tuning curves in S13 To assess the robustness of these results, we performed three control analyses: First, within each pool and each trial type, we randomly shuffled neural responses. Permuting neural responses for a given trial type destroys response patterns across trial types while preserving the distributions of responses within trial types. We expected that this would abolish the categorical representation in the data. S2 Fig shows that this was indeed the case. Of note, silhouette plots are concave (S2A-S2G Fig), resembling those that we obtained for synthetic data without a categorical representation (Fig 5A-5N). Moreover, the AMI is low (S2H Fig, S2I  Fig, S3 Fig "Shuffled data"), in line with the values obtained for synthetic data without categories (Fig 6D-6F, S3 Fig "Uniform data") and unlike the data before shuffling (S3 Fig "Original  data").
Second, we analyzed neural responses from the post-offer, late-delay and pre-juice time windows as well. S4 Fig, S5 Fig and S6 Fig illustrate the cluster analyses for the post-offer, latedelay and pre-juice responses, respectively. For each time window, convex silhouette plots (S4A-S4G Fig, S5A-S5G Fig, S6A-S6G Fig) and the magnitude of AMI values (S4H Fig, S4I  Fig, S5H Fig, S5I Fig, S6H Fig, S6I Fig) clearly confirm the categorical nature of the representation, as did the 2-dimensional projected cluster visualizations (S7 Fig, S8 Fig, S9 Fig). For the post-offer window, a lower number of clusters yielded the highest AMI. For consistency, we show the same number of variables for the post-offer, late-delay and pre-juice time windows (S10 Fig, S11 Fig, S12 Fig, respectively). For all time windows, the response prototypes resembled those obtained from the post-juice data set (Fig 9, S10 Fig, S11 Fig, S12 Fig). When plotting fewer variables, we generally obtained subsets of the corresponding plots with greater number of variables (c.f. S12 Fig, Fig 9), indicating that a similar number of variables is present in the data even though expressed to varying degrees.
Third, we applied the PAIRS analysis developed by Raposo and colleagues [3] to test for the presence of neural clusters (see Methods). Confirming our results, the PAIRS analysis indicated clear categories (PAIRS index 0.67, two-sided p-value from Monte Carlo simulations < 0.001). (Of note, the number of cells included in our analysis was much larger than that in the Raposo study.) Table 2 summarizes the results of our analyses. For 4 clusters and 2 variables, the algorithm selected variables chosen value A and chosen value B for all neuronal pools. For 6 clusters and 3 variables, variables offer value A, offer value B and chosen juice were selected for all pools. For 8 clusters and 4 variables, the algorithm selected variables offer value A, offer value B, chosen value and chosen juice. Note that these are the same variables identified in previous studies [13,25]. For 10 clusters and 5 variables, the algorithm selected these same variables plus the variable value ratio (= other/chosen value). The substantial consistency in the variables identified with increasing numbers of clusters indicates that the results are very robust.

Discussion
We presented a new algorithm to assess whether a neuronal representation is categorical or category-free, and to identify the encoded variables if the representation is indeed categorical. The method involves two steps. First, we cluster the data without committing to any particular variable. Second, we match clusters with a set of candidate variables. Quantifying similarity between the clusters of the two steps makes it possible to identify the variables most consistent with the neuronal data. This new method overcomes limitations of previous approaches, and is widely applicable. In this study, we tested the algorithm on synthetic data and on neuronal data recorded in the primate OFC during economic decisions. With respect to the latter, the most notable result is that we found the neuronal representation in OFC to be categorical in nature. This result confirms previous assessments of this same data set [13,25], and the results obtained by other research groups [14]. We suggest that the categorical nature of the neuronal representation sets apart OFC from other prefrontal regions, where task-relevant variables are encoded in category-free representations [6][7][8][9][10]. Importantly, we confirmed our result through the PAIRS analysis previously used by Raposo et al to demonstrate non-categorical encoding in the rodent posterior parietal cortex [3]. This result highlights qualitative differences between brain regions.
In addition, our algorithm identified a set of variables encoded in OFC. The variables most reliably detected-offer value A, offer value B, chosen value and chosen juice-coincide with those identified in previous studies [12,13]. One difference concerns the number of variables. Previous work identified 4 variables imposing a criterion on the marginal explanatory power (i.e., each additional variable should explain �5% responses) [13,25,36]. In contrast, the AMI Categorical encoding of decision variables in orbitofrontal cortex criterion establishes the optimal number of variables as 2. Several elements may explain this finding. The AMI procedure penalizes the addition of further variables and thus tends to provide a conservatively small number of clusters. Concurrently, the variables encoded in OFC are substantially correlated in the experiments [13]. Geometrically, this means that the centers of different clusters are close to each other on the hyper-sphere, and not distributed randomly as might implicitly be assumed. Exacerbating this issue, in our data, neuronal responses encoding the chosen value have some additional jitter, because the relative value of two juices varied to some extent from session to session. This fact effectively broadened the corresponding cluster on the hyper-sphere.

Comparison with other approaches
In previous work, we assessed the categorical nature of the representation in OFC based on linear regressions and the analysis of the resulting R 2 [12,13]. As discussed above, that approach has some limitations, addressed by the algorithm presented here. Another approach, was proposed by Raposo and colleagues [3]. The PAIRS analysis confirmed the categorical nature of the representation in OFC, but it did not identify specific variables encoded by the population. Another clustering-based method to assess categorical encoding was recently proposed by Hirokawa et al [14]. Their data set was recorded from the rat OFC and included 42 conditions. Applying principal component analysis as a pre-processing step, they first reduced this data set to 21 dimensions. Using spectral clustering, they identified 9 clusters (the number of clusters was determined based on bootstrap stability). While there are clear similarities between their approach and ours, there are also notable differences. Both approaches are founded on clustering of pre-processed neuronal activity. Hirokawa and colleagues applied spectral clustering, while we applied spherical k-means. On simulated data, we compared silhouette plots of several clustering procedures and we found that spherical k-means performed best. Most importantly, our approach associates easily interpretable variables with the identified clusters by making use of two comparative clustering steps-spherical k-means and variable-centroid clustering. Spherical k-means operates without prior assumptions on particular variables while variable-centroid clustering can be thought of as a cluster representation of a set of variables. By selecting the set of variables most similar to the assumption-free clusters, we obtain unbiased representations of neuronal categories.
Interestingly, both our results and the results of Hirokawa et al [14] differ from those of a recent study by Blanchard et al, who concluded that the neuronal representation in OFC is category-free [15]. This apparent discrepancy highlights the advantage of assessing the categorical versus non-categorical nature of a neuronal representation without committing to any particular set of variables. Blanchard et al examined data from an experiment in which monkeys chose between two gambles. Apart from the stakes, which varied from trial to trial, the two gambles differed qualitatively-one gamble was "informative", meaning that the outcome would be revealed to the animal shortly after the choice; the other was "uninformative", meaning that the animal would learn the outcome only at the end of the trial. Informative and uninformative gambles were associated with different colors, and informativeness consistently affected choices [37]. In the analysis, the authors regressed each neuronal response recorded in OFC separately on the stakes and on the informativeness. Since the distribution of regression coefficients was not condensed along these two axes, they concluded that the representation in OFC is non-categorical [15].
A limitation of Blanchard's approach is that the neural representation may actually be categorical, but the frame of reference and/or the encoded variables may not be those tested in the analysis. Specifically, neurons in the Blanchard study might have represented the identities and values of the offers in a color-based reference frame. Under these conditions, different groups of cells would encode the value of the informative or non-informative offers, with positive or negative sign. Such representation is categorical, but an analysis based on separate regressions on stakes and informativeness would fail to reveal its categorical nature. Similarly, an analysis of variables defined in an order-based reference frame would fail to reveal the categorical nature of the representation. To visualize this point, consider the clustering problem defined in the present study. Choosing two variables is equivalent to choosing a particular plane and to projecting all the data set from the hyper-sphere on that plane. Unless the vectors that identify the encoded variables lie on the plane, separate clusters will overlap and appear non-separable. Assessing the categorical or non-categorical nature of the representation without committing to a particular set of variables overcomes this weakness.

Categorical representation and mixed selectivity
We presented a general tool to assess whether a neuronal representation is categorical or noncategorical. Importantly, this issue is distinct from whether the encoding is pure or mixed [3]. Pure versus mixed selectivity is a property of individual cells. Consider an experiment in which conditions vary on two dimensions (e.g., visual stimuli that vary for the orientation and contrast). The activity of any given neuron could vary as a function of only one dimension (pure selectivity), as a function of a linear combination of the two dimensions (linear mixed selectivity) or as a non-linear combination of the two dimensions (non-linear mixed selectivity). In recent years, several studies have discussed the advantages of non-linear mixed selectivity [6,[9][10][11]. In contrast, categorical or non-categorical encoding is a property of the neuronal population [1][2][3]. Consider again an experiment in which conditions vary on two dimensions, referred to as variable1 and variable2. Imagine that neurons present mixed selectivity. In principle, neurons could all encode the same linear combination of the two parameters (a1 variable + a2 variable2, with a1/a2 fixed for the whole population). If so, the representation would be categorical. Alternatively, different neurons could encode different linear combinations of the parameters a1 variable + a2 variable2, with a1/a2 varying across the population. If so, the representation would be non-categorical. Non-categorical representations have been found in the rat posterior parietal cortex [3] and in lateral prefrontal cortex [7,8,38].
Non-categorical encoding implies mixed selectivity, but the converse is not true. This fact is well illustrated by the encoding of economic decision variables in OFC. By definition, subjective values integrate all the dimensions relevant to choice, including physical traits of the goods (commodity, quantity, probability, time delay, etc.) and properties internal to the subject (motivation, risk attitude, patience, etc.) [39]. For example, the subjective value of a quantity q of apple juice (A) received at time t with probability p is roughly equal (under simplifying assumptions!) to V A (q, p, t) � ρ A q p α e -t/τ , where ρ A captures the subjective desirability of the apple, α captures the risk attitude, and τ captures the patience. Clearly, the value is a non-linear combination of the dimensions varied by the experimenters (q, p and t). As a consequence, any value-encoding neuron will present non-linear mixed selectivity, as indeed observed in many studies [13,[40][41][42][43][44]. This circumstance, however, has no implications on the categorical nature of the representation. Consider for example our task, in which monkeys choose between juice A and juice B. Variables possibly encoded by the population include offer value A and offer value B. In principle, individual neurons could encode any linear combination a A offer value A + a B offer value B. The categorical or non-categorical nature of the representation is a property of the joint distribution for the coefficients [a A a B ] across the population. For example, if each neuron encodes only one of the two variables, the distribution for [a A a B ] has two peaks at [0 1] and [1 0] and is close to zero elsewhere. Similarly, there might be two groups of neurons encoding the value sum and the value difference. In this case, the distribution for [a A a B ] has two peaks centered on [1 1]/2 1/2 and [1-1]/2 1/2 and is close to zero elsewhere. In both these scenarios, the representation is categorical in nature. Conversely, coefficients [a A a B ] could be uniformly distributed on a broad domain, and the representation would be non-categorical. Our results demonstrate that the representation of decision variables in OFC is indeed categorical.

Experimental design and data set
The experimental procedures for data collection and preliminary data analyses have been described before [25]. Briefly, two monkeys participated in the study. All experimental procedures conformed to the NIH Guide for the Care and Use of Laboratory Animals and were approved by the Institutional Animal Care and Use Committee (IACUC) at Washington University in St Louis (protocol #20140031). Throughout the study, the animal health was overseen by a veterinary staff. Before training, a head restraining device and a recording chamber were implanted under general anesthesia (Isoflurane). Steps taken to increase the animal welfare included pair housing, cage enrichment, and usage of exclusively positive reinforcers.
In each session, a monkey chose between two juices (labeled A and B, with A preferred to B) offered in variable amounts. Each trial started with the animal fixating the center of a computer monitor. After 0.5 s, two sets of colored squares representing the two offers appeared on the two sides of the fixation point. For each offer, the color represented the juice type and the number of squares represented the juice amount. The animal maintained central fixation for a randomly variable delay (1-2 s), after which the fixation point was extinguished and two saccade targets appeared by the offers (go signal). The animal indicated its choice with a saccade and maintained peripheral fixation for 0.75 s before juice delivery.
In this experiment, the same neuron was recorded during two subsequent blocks of trials. Juices offered in the two blocks could be the same or different [25]. For the purpose of the present analysis, we considered data in each trial block independently. Thus each neuron appears in the analysis twice and the term "session" refers to a block of trials. In each session, offered quantities varied from trial to trial. An "offer type" was defined by two offers (e.g., [1A:3B]). Different offer types were pseudo-randomly interleaved. Their frequency varied, but each offer type was typically presented at least 20 times in each session. A "trial type" was defined by an offer and a choice (e.g., [1A:3B,A]).
In each session, choices were analyzed with a logistic regression: where q A and q B were the quantities of juices A and B offered to the animal. The relative value of the juices was inferred from the flex of the sigmoid and defined as ρ = exp(−a 0 /a 1 ). Neuronal data were recorded from central OFC using standard techniques [25]. The analysis of firing rates was based on four primary time windows: post-offer (0.5 s after the offer), late delay (0.5-1 s after the offer), pre-juice (0.5 s before juice delivery), and post-juice (0.5 s after juice delivery). For each trial type and each time window, firing rates were averaged across trials. A "neuronal response" was defined as the activity of one cell in a window as a function of the trial type. Task-related responses were identified with an ANOVA (factor trial type, p<0.001).
In preliminary work, we submitted the present data set to standard analyses for variable selection. In these analyses, we defined a large number of variables (Table 1), regressed each response on each variable, and used methods for variable selection to identify a subset of variables that best explained the population (see Results and [13]). These procedures replicated previous results, as neuronal responses were found to encode variables offer value A, offer value B, chosen value and chosen juice [13].

Neuronal pools
The hyper-spherical clustering procedures introduced in this study require that different neuronal responses be defined on the same trial types (i.e., in the same space). Importantly, the offer types presented to the animal in our experiments could vary from session to session, although the same few sets of offer types were used repeatedly in many sessions. As a result, the entire data set could be divided in six groups of neuronal responses defined on the same trial types.
The variables included in the analysis are defined in Table 1. Of note, some variables (e.g., chosen value) were defined based on the relative value of the juices, which depends on the animal choices and thus varies somewhat from session to session. Ideally, the analyses described in this study would be conducted on pools of neuronal responses recorded in the same session, such that variables would be defined equally for all the responses. In contrast, our neurons were recorded in different sessions. Hence, we grouped responses in pools of similar relative values. For each group of neuronal responses recorded with the same trial types we examined the distribution of relative values. For five of the six groups, the distribution was bimodal. Hence, we split each of them in two and we removed outliers based on the inter-quartile range (IQR). In conclusion, our data set included 9 pools of neuronal responses recorded with the same trial types and similar relative values. (The remaining variability in relative values was effectively a noise factor that, if anything, made it more difficult to show categorical encoding.) Neuronal pools included 139-536 responses, and each pool was analyzed separately. When combining similarity values obtained for different pools, we weighted the similarities according to the number of neurons in the pool.

Spherical representation of neuronal responses and variables
We represented neuronal responses as points in a high-dimensional space where each axis corresponds to a trial type. Raw neuronal responses were centered (by subtracting the mean firing rate across trial types) and normalized (imposing a unitary vector length). As a result, the neuronal population was constrained to the hyper-spherical surface of unitary radius. Similarly, for each variable we calculated a vector with elements given by the variable value in each trial type. We then centered and normalized the vector. Hence, each variable was represented as a point on the unitary hyper-spherical surface.
Previous work indicated that neuronal responses can encode a variable with positive or negative slope [13]. Hence, the sign of the normalized vector is ambiguous. For this reason, before conducting the clustering procedures, we mirrored each data point on the hyper-spherical surface. Resulting cluster centers were most of the time, but not always, symmetric when adding the mirror points. In principle, non-symmetrical clusters may be understood considering even and odd numbers of clusters. Symmetry implies an even number of true clusters, which is not necessarily the case. For instance, consider the case of the 3D sphere. If the raw data present only one cluster along the equator, adding mirror points will not generate a second cluster. If we run the algorithm imposing two clusters, the algorithm will place two cluster centers somewhere on the equator, but not necessarily on opposite ends. Now consider a situation where the raw data present a cluster along half of the equator and another cluster at one pole. Adding mirror points will result in one cluster along the equator and one cluster at each pole (3 clusters total). These examples demonstrate that mirroring does not necessarily induce an even number of clusters or symmetric cluster centers.

Variable selection procedure
We selected variables by evaluating cluster similarity of partitions induced by a set of variables and of partitions obtained from spherical k-means clustering. The general algorithm for selecting the most informative set of variables works as follows: • For given number of clusters, partition cells using spherical k-means clustering yielding partition U (see below) • For given number of variables n, select variables by: : Compute new centroids corresponding to the partitioning computed for fU ðtþ1Þ ðtÞ i x T c i is greater than the tolerance 1e-4 (default value), increment t by 1 and go to step 2. Otherwise, stop.
For given variables, we partitioned cells using proximity clustering: for each cell, we calculated the cosine distance to each variable and assigned the cell to the variable with the smallest distance. Variables therefore became centroids of the clusters.
Note that a partition U here refers to a set of sets {U 1 ,. . .,U R } where each element U i of the partition is a set of rate vectors. E denotes expectation of the mutual information over random partitions subject to having a fixed number of clusters and points in each cluster, H denotes entropy: and I(U,V) denotes mutual information [45] between U and V: Mutual information was used because of its several advantages as metric for computing statistical associations between neural variables or between neural and behavioral variables, namely its ability to capture all forms of associations between such variables, including both linear and non-linear ones at all orders [46]. In the above equation for I(U,V), n ij denotes the number of objects that are common to clusters U i and V j , that is n ij = #(U i \V j ) and a i ¼ n ij . Subtraction of the expectation values in the numerator and denominator adjusts the measure for chance and effectively corrects the positive bias of the measure. These terms can be calculated analytically [26]. In particular, we used the Python implementation sklearn.metrics.adjusted_mutual_info_score of the Scikit-learn package to calculate the AMI.
We checked all possible variable combinations (stopping at 5 variables) and collapsed the variables offer value A and offer value B to offer value A|B as well as chosen value A and chosen value B to chosen value A|B by pruning variable combinations that contained one but not the other of the collapsed variables. We then selected the variables and clusters with the greatest adjusted mutual information similarity.

Jackknife estimates of standard error
We estimated standard errors of adjusted mutual information values by apply the Jackknife procedure over pools [47]. For a given number of clusters and a given jackknife subsample, we took the maximum AMI over the different numbers of variables. This yielded a (#clusters x #subsamples) matrix. We then collapsed the subsample dimension in two ways: 1) For a given number of clusters, we averaged over subsamples to get the mean AMI. 2) For a given number of clusters, we used the jackknife equation for standard deviation [47] to get an estimate of the standard error:

Generation of synthetic data
We generated two synthetic data sets to test our variable selection procedure: one with categories and the other without categories.
To generate the data set with categories, we selected four variables: total value, offer value A, offer value B and chosen juice. We represented each of these variables in the trial type space on the hyper-sphere as a 9-dimensional vector with unit length (see Section "Representation of cells and variables"). Then, for each of these variables, we generated 100 synthetic cell responses by adding independent Gaussian noise to each of the vector elements (zero-mean, standard deviation 0.25). Using this procedure, we obtained point clouds around each variable consisting of 100 points each. Finally, we moved the points to the unit hyper-sphere by normalizing the vector length of each point. Each point then represented the centered and normalized firing rates of a synthetic cell.
To generate a data set without categories, we drew 400 samples uniformly on the 9-dimensional unit hyper-sphere. To do so, we generated a 9-dimensional vector with independent standard normal distributed elements for each sample and then normalized the vector to unit length.

PAIRS analysis
To test whether another method for detecting the presence of neural clusters would also indicate categories, we applied the PAIRS analysis [3] as follows. For the PAIRS analysis, an input matrix is required having size (number of cells)-by-(number of trial types times number of time points per trial). For a given trial type we did not have a neural response for each cell. Therefore, we selected the nine most common trial types leaving 2380 cells that had responses for all of these trial types. We filled the matrix by calculating the 10 ms peri-stimulus time histogram for each of these cells and trial types in the time window from -500 ms before offer onset to 1,000 ms after offer onset. We then performed principal component analysis to reduce this matrix to a (number of cells)-by-8 matrix. For each cell, we then found the k nearestneighbors and calculated the k-angle, that is the mean angle it made with each of these neighbors. The median of the angles over cells then yieldedŷ data . We then generated 10,000 matrices of size (number of cells)-by-8 filled with Gaussian random variables and calculated the k-angle for each of them. The median over the (number of cells)-times-10,000 angles then yielded y random . We then calculated the PAIRS index as PAIRS ¼ŷ random Àŷ datâ y random

:
We varied k between 2 and 39 and found stable PAIR indices between 0.67 and 0.70. In this range,ŷ random varied between 0.22 and 0.69.