Categorical Dimensions of Human Odor Descriptor Space Revealed by Non-Negative Matrix Factorization

In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF) – a dimensionality reduction technique – to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures.


Introduction
Our understanding of a sensory modality is marked, in part, by our ability to explain its characteristic perceptual qualities [1,2]. To take the familiar example of vision, we know that the experience of color depends on the wavelength of light, and we have principled ways of referring to distances between percepts such as 'red', 'yellow' and 'blue' [2,3]. In olfaction, by contrast, we lack a complete understanding of how odor perceptual space is organized. Indeed, it is still unclear whether olfaction even has fundamental perceptual axes that correspond to basic stimulus features.
Early efforts to systematically characterize odor space focused on identifying small numbers of perceptual primaries, which, when taken as a set, were hypothesized to span the full range of possible olfactory experiences [4][5][6]. Parallel work applied multidimensional scaling to odor discrimination data to derive a twodimensional representation of odor space [7,8], and recent studies using dimensionality reduction techniques such as Principal Components Analysis (PCA) on odor profiling data have affirmed these low-dimensional models of human olfactory perception [9][10][11]. A consistent finding of these latter studies is that odor percepts smoothly occupy a low dimensional manifold whose principal axis corresponds to hedonic valence, or ''pleasantness''. Indeed, the primacy of pleasantness in olfactory experience may be reflected in the receptor topography of the olfactory epithelium [12] as well as in early central brain representations [13].
Here, we were interested in explicitly retaining additional degrees of freedom to describe olfactory percepts. Motivated by studies suggesting the existence of discrete perceptual clusters in olfaction [14,15] we asked whether odor space is amenable to a description in terms of sparse perceptual dimensions that apply categorically. To do so, we applied non-negative matrix factorization (NMF) [16][17][18][19] to the odor profile database compiled by Dravnieks [20] and analyzed in a number of recent studies [9][10][11]. NMF and PCA are similar in that both methods attempt to capture the potentially low-dimensional structure of a data set; they differ, however, in the conditions that drive dimensionality reduction. Whereas basis vectors obtained from PCA are chosen to maximize variance, those obtained from NMF are constrained to be non-negative. This constraint has proven especially useful in the analysis of documents and other semantic data where data are intrinsically non-negative [19,21] -a condition that is met by the Dravnieks database.
Applying NMF, we derive a 10-dimensional representation of odor perceptual space, with each dimension characterized by only a handful of positive valued semantic descriptors. Odor profiles tended to be categorically defined by their membership in a single one of these dimensions, which readily allowed co-clustering of odor features and odors. While the analysis of larger odor profile databases will be needed to generalize these results, the techniques described herein provide a conceptual and quantitative framework for investigating the potential mapping between chemicals and their corresponding odor percepts.

Non-Negative Matrix Factorization (NMF)
Non-negative matrix factorization (NMF) is a technique proposed for deriving low-rank approximations of the kind [16][17][18]: where A is a matrix of size m|n with non-negative entries, and W and H are low-dimensional, non-negative matrices of sizes m|s and s|n respectively, with svmin(m,n). The matrices W and H represent feature vectors and their weightings. NMF has been widely used for its ability to extract perceptually meaningful features, from high dimensional datasets, that are highly relevant to recognition and classification tasks in several different application domains.
To derive W and H we used the alternate least squares algorithm originally proposed by Paatero [17]. Realizing that the optimization problem is convex in either W and H, but not both, the algorithm iterates over the following steps: 1. assume W is known and solve the least squares problem for H using: 2. set negative elements of H?0 3. assume H is known and solve the least squares problem for W using . set negative elements of W?0.
We used the standard implementation of non-negative factorization algorithm ( nnmf.m) in Matlab (Mathworks, Inc.). Given the size of the odor profile matrix (146|140), the speed of convergence was not an issue. As a stopping criterion, we chose a value of 1000 for the maximum number of iterations. Given the iterative nature of the algorithm and small size of the dataset, we expect the algorithm to reach a global minimum for small s and a fixed point for large s.
Note that a minimum solution obtained by matrices W and H can also be satisifed by the pairs such as WD and D {1 H for any nonnegative D and D {1 . Thus, scaling and permutation can cause uniqueness problems, and hence the optimization algorithm typically enforces either row or column normalization in each iteration of the procedure outlined above.

Cross-validation procedure with training and testing sets
The choice of sub-space dimension s is problem dependent. Our strategy was to iterate over the sub-space dimension from s~1 to 50, dividing the data matrix A each time into random but equal-sized training and testing halves. We kept track of the residual error in the form of the Frobenius error norm: DDA{WHDD 2 F for both training and testing sets. For each choice of s we repeated this division 250 times, with a stopping criterion of 1000 iterations, to report the statistics on residual errors. In addition, once an optimal sub-space dimension is chosen, we report the most stable version of the basis matrix, by computing KL-divergence between every pair of the 250 instances of W from the training set and picking W with the lowest mean KLdivergence value.

Scrambling odor profiles
We applied NMF to scrambled perceptual data, that is elements of A are scrambled (randomly reorganized) before analyzing with NMF. Three different scrambling procedure were implemented. First was odorant shuffling where the column values of A are randomly permuted in each row. The second was descriptor shuffling where the row values of matrix A are randomly permuted in each column. Finally, we scrambled the elements of the entire matrix, that is indiscriminate shuffling of both descriptors and odorants entries.

Consensus matrix
We tested the stability of the NMF results on the original and scrambled versions of the perceptual data using a consensus clustering algorithm proposed in [22,23]. Because NMF is an iterative optimization algorithm, it may not converge to the same solution each time it is run (with random initial conditions). For a sub-space of dimension s, NMF algorithm groups descriptors and odorants into s different clusters. If the clustering into s classes is strong, we expect the assignment of descriptors or odorants to their respective clusters will change only slightly from one run to another. We quantified this with a consensus matrix. For illustration, we will work with cluster assignments made to the descriptors. In particular, each descriptor i is assigned to a metadescriptor s', where W(i,s') is the highest among all the values of W(i,k) with 1v~kv~s.
We first initiated a zero-valued connectivity matrixC C of size m|m. For each run of NMF, we updated the entries of the connectivity matrix by 1, that isC C ij~C C ij z1 if descriptors i and j belong to the same cluster, or 0 if they belong to different clusters. Averaging the connectivity matrix over all the runs of NMF gives the consensus matrix C, where the maximum value of 1 indicates that descriptors i and j are always assigned to the same cluster. We ran NMF for 250 runs to ensure stability of the consensus matrix. If the clustering is stable, we expect the values in C to be close to either 0 or 1. To see the cluster boundaries, we can use offdiagonal elements of C as a measure of similarity among descriptors, and invoke an agglomerative clustering method where one starts by assigning each descriptor to its own cluster and then recursively merges two or more most similar clusters until a stopping criterion is fulfilled. The output from the agglomerative clustering method can be used to reorder the rows and columns of C and make the cluster boundaries explicit.

Cophenetic correlation coefficient
We then evaluated the stability of the clustering induced by a given sub-space dimension s. While visual inspection of the reordered C can provide qualitative insights into the stablity of cluster boundaries, we seek a quantitative measure by using the cophenetic correlation coefficient approach suggested in [23]. Note that there are two distance matrices to work with. The first distance matrix is induced by the consensus matrix generated by sdim NMF decomposition. In particular, the distance between two descriptors is taken to be 1{C ij . The second distance matrix is one induced by a agglomerative clustering method, such as the average linkage hierarchical clustering (HC). In particular the offdiagonal elements of the consensus matrix can be used as distance values to generate hierarchical clustering (HC) of the data (in Matlab, invoke: linkage.m with average linkage option). HC imposes a tree structure on the data, even if the data does not have a tree-like dependencies and is also sensitive to the distance metric in use. HC generates a dendrogram and the height h ij of the tree at which two elements are merged provide for the elements of the second distance matrix. The cophenetic correlation coefficient r s is defined to be the Pearson correlation value between the two distance matrices. If the consensus matrix is perfect, with elements being either 0 or 1, then r s is 1. When the consensus matrix elements are between 0 and 1, then r s v1.
We plot r s vs s for increasing values of s. The results of such analyses are in some cases helpful for choosing an optimal subspace size. If a given clustering (say, for subspace size of s{1) is highly reliable across repeated factorizations (that is, the same sets of descriptors and the same sets of odors tend to co-cluster), and hence r s{1 is very high, then one is motivated to retain at least (s{1) dimensions. If increasing this subspace size (to s, sz1, etc) leads to systematically less reliable clustering, r s{1 w(r s ,r sz1 , Á Á Á ), one is motivated to retain the more conservative estimate of dimensionality (s{1). That said, we note that cophenetic correlation analyses can often provide better grounds for excluding certain choices of subspace size that lead to unreliable clustering, rather than privileging a specific number as 'the' dimensionality of the data. Note we seek solutions where sw1 because for s~1 the correlation coefficient r 1~1 . We also performed a similar consensus clustering and cophenetic coefficient analysis in the odorant space using the entries in H.

Odor space visualization
We use a variant of stochastic neighbor embedding method [24,25] to visualize the high-dimensional odor space organized by NMF. In particular, we first generated the consensus matrices for clustering descriptors and odorants, and used them separately as similarity matrices in the stochastic neighbor embedding algorithm. We used the code from http://homepage.tudelft.nl/19j49/ t-SNE.html and ran it with default parameters.

Dimensionality of odor space
We analyzed the published data set of Dravnieks [20], which catalogs perceptual characteristics of 144 monomolecular odors. Each odor in this data set is represented as a 146 dimensional vector (an odor profile), with each dimension corresponding to the rated applicability of a given semantic label, such as 'sweet', 'floral', or 'heavy'. Because these are strictly non-negative quantities (i.e. a given semantic label either applies, or does not), we reasoned this could be meaningfully exploited when reducing the dimensionality of profiling data. Thus, we applied NMF to the profiling data in an effort to obtain a perceptual basis set corresponding to 'parts' or 'features', as has been observed in the analysis of images [18] and text [18,21].
NMF seeks a low-rank approximation of a matrix A (146 descriptors | 144 odors in the present case) as the product WH, where the s columns of W are non-negative basis vectors (146-D vectors of odor descriptors in the present case), and the columns of H are the new s-dimensional representations of the original odors (144 columns, in the present case) ( Fig. 1A). Figure 1B shows the root-mean-squared (RMS) residual (see Methods) between A and its approximation WH for subspaces ranging from 1 to 50 (100 equal divisions of A into training and testing subsets, for each choice of subspace). The residual attained a minimum for a subspace choice of 25, and increased for larger subspaces. In addition, the width of the error bars increased on the training and testing residuals after subspace 25. Increasing the number of iterations used for training the NMF model only marginally reduced the size of the error bars. We speculate that the energy landscape is becoming increasingly rugged, with the existence of many more local minima to potentially trap the learning of NMF model parameters. In particular, NMF employs a non-linear optimization method, and hence it is possible that the each time the method is run, it finds a local minimum that is different and far away from a global minimum. Hence, the error bars on the residuals are large and continue to increase with increasing subspace dimensionality s because of the ruggedness in the landscape and the limited size of odor profile data used for training the model.
Notably, for subspaces 1-25 -a regime in which training error decreases continuously -the testing error decreases, attains a minimum, and then begins to increase. Thus, while a 25 dimensional representation of the original perceptual data is evidently the most accurate achievable with NMF, it is not necessarily the most parsimonious. Inspecting low-order basis vectors, we observed that descriptors with largest-amplitudes were consistent across repetitions of the factorization, and corresponded to broadly applicable labels such as 'fragrant', and 'sickening' (see Figure 2 for examples). By contrast, higher order basis vectors (w10) had peak-value descriptors that were highly specific ('anise', 'cinammon', etc), and somewhat variable between NMF repetitions.
To more quantitatively motivate the choice of subspace size, we applied two techniques commonly used in problems of NMF model selection [23,26]. First, we plotted reconstruction error (that is, the fraction of unexplained variance) vs subspace size for 250 different repetitions of NMF (Fig. 1C), and compared this to the reconstruction error obtained with PCA performed on the original data (PCA orig ) as well as on scrambled data (PCA scram ) (Fig. 1D) [26]. The slope of PCA scram is small and relatively constant for increasing subspace sizes (Fig. 1D), and provides a means for estimating the point after which a given model is explaining noise rather than correlations in data . To visualize this cutoff point, Figure 1D plots the change in variance for each added dimension (differences between successive points in Figure 1C). The reconstruction error rates of both PCA orig and NMF intersect with PCA scram at subspace size 10 ( Fig. 1D), indicating that there is no gain in retaining dimensions w10 for either dimensionality reduction method. This is consistent with a recently published estimate of the intrinsic dimensionality of this same dataset [11], using PCA. For a further comparison of NMF with PCA, we show cumulative variance plots of PCA and several runs of NMF in Fig. S1.
As a second means for quantifying the intrinsic dimensionality of the Dravnieks data set, we calculated the cophenetic correlation coefficient [23] for several choices of subspace size. Briefly, this method exploits the stochasticity inherent in NMF to determine how reproducible the derived basis set and odor weights are across repetitions of the factorization. Cophenetic correlations &1 indicate highly reproducible basis sets (see Methods for further explanation). We note that cophenetic correlation analyses can often provide better grounds for excluding certain choices of subspace size that lead to unreliable clustering, rather than privileging a specific number as 'the' dimensionality of the data.
The results of our cophenetic correlation analysis are shown in supplementary Fig. S2. Two features are readily apparent: First, there are some notably poor choices of subspace size (such as s~4 or s~5). We speculate that the sharp drop at these values is because at these subspace choices, the classification scheme has lost the advantage of being simple and dichotomous, but has yet to support enough categories for accurate and reliable classification. Second, unlike with the reconstruction error criterion (above), there is no monotone decreasing relationship between cophenetic correlation and dimension size that provides an obvious stopping criterion. Our interpretation of this is that there are many good, reduced-dimensionality representations of the Dravnieks data that exhibit sparse structure.
Given that analysis of reconstruction error (Fig. 1D) argues for a choice of 10 dimensions as a cutoff point, and cophenetic correlation analysis suggests there are many well-motivated choices of subspace choices w~6 (Fig. S2), we therefore settled on a subspace size of 10 for all further analyses. Visualizations of NMF reconstruction quality for different choices of subspace size are provided in figure S3, which shows that most of the global and local structure of the original data is explained with 10 NMF basis vectors. We wish to note, however, that in general there is no single exact criterion for NMF model selection. There are multiple justifiable choices of subspace size, each of which may lead to different insights about the data, or be useful for different goals.

Sparseness of basis vectors
An immediate consequence of the non-negativity constraint is sparseness of the basis vectors. As seen in Figure 2, the basis vectors consist of a handful of large values, with the remaining values near or equal to zero. Intuitively, a given basis vector ð Þ Plot of residual error between perceptual data, A, and different NMF-derived approximations. WH. For each choice of subspace, data were divided into random training and testing halves, and residual error between A and WH computed. One-hundred such divisions into training and testing were used to compute the standard errors shown (shaded areas). C ð Þ Reconstruction error (fraction of unexplained variance) for PCA and NMF vs. number of dimensions. The change in reconstruction error for the first interval is indicated by asterisks(*), and corresponds to the first point in the next panel. D ð Þ Change in reconstruction error for PCA and NMF, compared to the change in reconstruction error for PCA performed on a scrambled matrix (PCA scram ). PCA scram is used to estimate the cutoff number of dimensions for which a given dimensionality reduction method is explaining only noise in a dataset. Note that each point, n, is actually the difference in reconstruction error between dimensions n and nz1 (by way of illustration, points with an asterisk in this panel denote corresponding intervals in the previous panel C). doi:10.1371/journal.pone.0073289.g001 indicates a subset of descriptors that are related and particularly informative (Fig. 2 A), while the set of all basis vectors (Fig. 2B) defines a library of such aggregate descriptors that span the space. Figure 2C shows the first four basis vectors, which have been normalized and ranked in decreasing order to highlight their sparseness. The six most heavily weighted descriptors for each basis vector are shown to the right. Together, these vectors define 4 descriptor axes that can be roughly labeled as 'fragrant', 'woody', 'fruity', and 'sickening.' We note that these labels are for purposes of concision only, as each axis is actually a meta-descriptor consisting of a linear combination of more elementary descriptors. A list of rank-ordered descriptors for all 10 dimensions is shown in Table 1.
To ensure that the sparse basis vectors we obtained were not an artifact of the NMF procedure, but rather depended on correlations in the data, we repeated the calculation of W for three shuffled versions of the profiling data (Fig. 3). In the 'full shuffle' condition, all elements of the data matrix A were randomly permuted, eliminating all correlations. In the 'descriptors-shuffled' conditions, the elements of each column of A were randomly permuted, while in the final 'odorants-shuffled' conditions, the elements of each row of A were randomly permuted. In agreement with the idea that the sparseness obtained by NMF is data dependent, sparseness was drastically reduced in the basis sets obtained from all sets of shuffled data (compare Fig. 3C with Fig. 2B).
In histograms of basis vectors obtained from the full-shuffled and descriptor-shuffled data (Fig. 3A), it was evident that both basis sets contained fewer zero-valued elements than the unshuffled basis set. Interestingly, the long-tail behavior of the histogram was preserved (even enhanced) in the odorantsshuffled condition (Fig. 3B). While this does indicate that a small number of basis vector elements did have very large values in the odorant shuffle cases, this was notably at the expense of peak behavior at zero (Fig. 3A, green). Moreover, basis vectors derived from a given odorant-shuffled matrix were highly inconsistent across repetitions of the factorization, which we assessed by computing consensus matrices (see Methods) documenting the stability of clusters across different iterations of NMF ( Figure 4). In brief, we found that only the original data had clusters that were consistent across iterations.
While these first several NMF dimensions (Fig. 2, and Table 1) define a perceptual descriptor space reminiscent of that observed previously with PCA, we note that variance is distributed somewhat differently in the NMF vs PCA basis sets. In essence, we have traded degrees of freedom for increased interpretability of individual perceptual dimensions. Interestingly, despite the fact that NMF imposes no formal orthogonality constraint on basis vectors, the perceptual basis set discovered by NMF was still nearorthogonal (Fig. 5); that is, most pairwise comparisons among the basis vectors in W subtend an angle close to p=2 (median angle = 72.9 degrees).

Distribution of odors in the new perceptual descriptor space
We next asked how the 144 individual odor profiles (that is, columns of H) are distributed in the new 10 dimensional perceptual descriptor space spanned by W. One possibility, for example, is that many of the descriptor space dimensions are redundant, resulting in odors being confined to a thin, lowdimensional slice of the full space. At the other extreme, odors may densely occupy descriptor space, indicating that dimensions contain non-redundant features, with all dimensions necessary to fully characterize odors.
To investigate these and other possibilities, we first examined the structure of H, the matrix of odor weights obtained from NMF (recall that each column of H corresponds to an odor, and defines a point in 10-dimensional descriptor space spanned by W; Fig. 1A). We took the Euclidian norm of each column of H, and then sorted all columns into 10 groups defined by their largest coordinate in descriptor space. More explicitly, the 144 columns of H were scanned left to right until one was found with a largest coordinate in dimension 1. This was then assigned as the first column of the re-ordered matrix. The remaining set of columns was similarly scanned, until all columns with a largest first-coordinate had been found. This procedure was then iterated on the remaining dimensions 2-10. Note that this is just a cosmetic reordering of columns that preserves row orderings -no new structure has been added, and no existing structure been destroyed.
Intriguingly, this procedure revealed a prominent block diagonal structure to the full matrix H (Fig. 6A) indicating that: 1) a given odor tends to be characterized by a single prominent dimension, and 2) all 10 dimensions are occupied. Furthermore, this suggests that a given odor percept may be considered an instance of one of several fundamental qualities (see discussion).
These two properties can be alternatively visualized when odors (columns of H) are plotted as points in the 10 dimensional perceptual space spanned by basis set W. Because this perceptual space is high-dimensional and difficult to represent geometrically, we show a representative 3 dimensional subspace of W. We note that this is not a projection of the data, but rather a selective visualization of a subspace. Figure 6B shows all 144 odors in the space spanned by perceptual dimensions 1-3. Most odors are clustered diffusely near the origin (gray points in Fig. 6B), since their peak coordinates do not reside in this particular 3-D subspace. By contrast, when odors are separated into groups defined by peak coordinate (as in Fig. 6A), it is evident that a given odor tends to be best defined by a single perceptual dimension. The black, red, and blue points in figure 6B, for example, are those points with largest coordinates occurring in the first, second, and third dimensions respectively. While there was notable structural homology among the odors in a given diagonal block of H (Fig. 6C), we did not quantify this further in the present work. Figure S4 shows additional representations of odorants distributed in descriptor space, and further highlights the categorical nature of the perceptual space derived from NMF. As a final means for investigating whether odorants are smoothly vs. discretely arranged in descriptor space, we constructed two-dimensional embeddings for the matrices W and H using the stochastic neighbor embedding (SNE) algorithm. Briefly, this technique provides a planar representation of all pairwise distances between odors in the original high dimensional space, such that relative neighbor relations are preserved (e.g. odors that are close together in the original space are also close together in the embedding). Applying SNE to the descriptor space (W), we obtained 8 discrete and non-overlapping clusters of the 146 descriptors, which are shown in Figure 7. Similarly, applying SNE to the space of odorants (H), we obtained 10 discrete and nonoverlapping clusters of the 144 odors ( Figure 8). In sum, the perceptual descriptor space derived from NMF is not smoothly occupied.

Bi-clustering of descriptors and odors
The perceptual space, W, discovered by NMF can be considered a set of 10 meta-descriptors, each of which is a linear combination of more elementary descriptors. While these dimensions are compact and categorical in that a given odor tends to have a prominent single coordinate (Figs. 6 and S4), this may also ð Þ Tail behavior of histograms, same procedure and conditions as in A ð Þ; note difference in scaling of axes between A ð Þ and B ð Þ. C ð Þ Waterfall plots of basis sets obtained when NMF was applied on shuffled data, for various shuffling conditions. Note the comparative lack of sparseness, relative to the basis set shown in Fig. 3A. Reproducibility of basis vectors across iterations of NMF for shuffled data sets was eliminated, or severely compromised, as shown in Fig. 4 obscure interesting details about the organization of the descriptor space. For example, within a dimension there may be correlations between specific descriptors and specific odors.
To explore this potential fine-scale structure wherein subsets of odorants show distinct correlations among subsets of descriptors, we sought submatrices of WH (the NMF approximation to the original data matrix A ) with large values in both the descriptor and odorant dimensions (Fig. 9). Briefly, we did this by performing 10-reorderings (one for each perceptual dimension) of rows and columns of WH via the process illustrated in figure 9A. Rankordering the first column of W, for example, aggregates the peak valued descriptors for the first perceptual dimension, W 1 . Similarly, rank ordering the first row of H aggregates those odorants with largest weights in W 1 . Applying these row and column re-orderings simultaneously to the matrix WH gives a matrix whose largest values are in the upper-left corner.
The clear upper-left organization of these submatrices illustrates that there are sets of odors to which distinct odor descriptors apply.   Members of all clusters, as defined by their peak coordinate in the new 10 dimensional descriptor space, are given in Table 2.

Discussion
We have applied non-negative matrix factorization (NMF) to odor profiling data to derive a 10-dimensional descriptor space for human odor percepts. For the data set investigated, individual odor profiles are well-classified by their proximity to a single one of these dimensions, with all 10 dimensions being approximately equally expressed across the set of odors. This is consistent with the notion that olfactory space is high-dimensional [27], and not smoothly occupied [14,28]. More speculatively, the observation that odors tend to be confined to a single best dimension of the NMF basis ( Figure 6, and Figure S4 in supporting information) suggests that a given olfactory percept can be described as an 'instance' of one of several fundamental qualities. Whether these proposed qualities are innate or the product of learning is, naturally, an important question, but one that is beyond the scope of this study. In addition, we note two important caveats of the present work. First, the fundamental odor qualities we propose are necessarily provisional, given the limitations of the Dravnieks data set in size and odorant diversity. Second, constraining perceptual judgments to a fixed and possibly limited lexicon (i.e. the 146 descriptors) may obscure the true complexity of odor space.
The perceptual dimensions obtained from NMF identify descriptors that are salient in several previous analyses of odor space [9][10][11]13,27], and commonly applied in ratings of odor quality. Moreover, these dimensions are consistent with a broad ecological perspective on olfactory function [29,30] which emphasizes the importance of chemosensation in coordinating approach, withdrawal, and the procurement of safe food. For example, we observe, as others have, dimensions corresponding to relative pleasantness ('fragrant' (W 1 ), 'sickening' (W 4 )). In addition, most of the remaining dimensions identified appear to correspond to cues of potential palatability/nonpalatability: 'fruity, non-citrus' (W 2 ), 'woody, resinous' (W 3 ), 'chemical' (W 5 ), 'sweet' (W 7 ), and 'lemon' (W 10 ). We hasten to note that the labels applied above are only an aid to intuition, as each perceptual basis is really a metadescriptor consisting of linear combinations of more elementary descriptors. Moreover, it is possible that such linear combinations obscure interesting details about the exact positions of these more elementary descriptors. For a thorough treatment of this issue, one should consult Zarzo et al [9,31].
While several of these same principal qualities have been identified before, NMF describes a notably different representation of the space in which they reside. Specifically, NMF leads to a description of odor space defined by dimensions that apply categorically. By contrast, odors in PCA space are more diffusely distributed across dimensions. Moreover, odors in PCA space (as well as spaces derived from multidimensional scaling and factor analysis) tend to be smoothly distributed in subspaces that span multiple axes, though heirarchical applications of PCA have identified several quality-specific clusters [9]. Naturally, these Figure 9. Co-clustering of descriptors and odors. A ð Þ Overview of method used for defining a bicluster (see text for definition). A column k of W (descriptors), and the corresponding k th row of H (odors) are rank ordered. The indices derived from the rank-ordering are used to re-order rows and columns of WH (accomplished by computing the outer product between the rank-ordered k th column of W and rank-ordered k th row of H), producing a submatrix with high correlation among both odors and descriptors. By the nature of the sorting procedure, these matrices -biclusterswill have their largest values in the upper-left corner. For purposes of visualization, biclusters were convolved with an averaging filter. B ð Þ The 10 biclusters defined by NMF on odor perceptual data. doi:10.1371/journal.pone.0073289.g009 differences in the representation of odor space are a consequence of the different constraints applied when obtaining a basis from PCA vs NMF. Whereas PCA basis vectors are chosen to be orthogonal, and allow any linear combination of variables, NMF basis vectors are constrained to be non-negative, allowing only positive combinations of variables. It is worth noting, however, that the NMF basis set is still approximately orthogonal (mean pairwise angle between different basis vectors is 72.9 degrees (Fig. 5)). Moreover, NMF is capturing structure in the data beyond simple first-order statistics, as applying NMF to scrambled versions of A fails to produce sparse and perceptually meaningful basis vectors ( Fig. 3).
Intuitively, the non-negativity constraint produces NMF basis vectors defined by subsets of descriptors that are weighted and coapplied in particularly informative combinations, defining dimensions that range from absence to presence of a positive quantity. This contrasts to basis vectors and dimensions derived from other techniques, which extend from one quality to that quality's presumed opposite. Such dimensions have intuitive interpretations in some cases, for example, the experimentally supported 'pleasantness' dimension corresponding to principal component 1 (PC1), which ranges from 'fragrant' to 'sickening'. Interestingly, constraining the NMF subspace to 2 shows that most odors fall homogeneously along a continuum reminiscent of the first principal component (Fig. S5 in supporting information). However, second and higher order PCs become progressively more difficult to interpret, spanning such qualities as 'woody, resinous' ? 'minty, peppermint' (PC2), and 'floral' ? 'spicy' (PC3). Whether odor percepts are more accurately represented as residing in dimensions that span oppositely valenced qualities, or dimensions that represent only a single quality will depend on whether there is systematic opponency in peripheral or central odor representations.
It may be possible to observe physiological properties of odor representations indicative of one kind of representation vs. another. If the underlying perceptual dimensions of odor space are categorical, one would expect relative similarity between odor representations for odors occupying the same putative perceptual dimension. Similarly, one would expect abrupt, state-like transitions in neural representations of slowly morphing binary mixture stimuli whose component odors nominally 'belong' to different perceptual dimensions. Consistent with these criteria, a recent study has shown discrete transitions in the ensemble activity of the zebrafish olfactory bulb during such odor morphs ( [14], but see [32]). Our study has some limitations that should be noted. Chief among these is the small size of the odor profiling data set used relative to the much larger set of possible odors, which may limit the generality of our findings. In future studies, it will be necessary to extend the NMF framework to larger sets of odors than the 144 investigated presently, such that a more complete and representative sample from odor space is obtained. Another limitation pertains to the 'subjective' nature of odor profiling data. While profiles are quantitative in the sense that they are stable and reliable across raters [33], it is clearly important to corroborate profiling-derived estimates of the intrinsic dimensionality of odor space, as well as proposals for how this space is structured, with psychophysical tests of discriminability [34]. It would be interesting, for example, to test whether the approximately orthogonal axes we observe are recapitulated in data derived from tests of pairwise discriminability. Finally, our analysis cannot distinguish between perceptual vs. cognitive influences on the organization of human odor space. One possibility is that the coarse division of odor-space into quality-specific axes reflects the existence of fixed points or attractors [14,28] that guide odor processing dynamics; similarly, there may exist a set of especially stable, prototypical glomerular maps that serve a related functional role. Another possibility is that early olfactory processing only resolves odor quality to a degree sufficient to rank relative pleasantness, with further parsing of this percept into discrete categories occurring through mechanisms involving learning and context.
In summary, we have shown that olfactory perceptual space can be spanned by a set of near-orthogonal axes that each represent a single, positive-valued odor quality. Odors cluster predominantly along these axes, motivating the interpretation that odor space is organized by a relatively large number of independent qualities that apply categorically. Independently of whether our description of odor space identifies innate or 'natural' axes determined by receptor specificities, it provides a compact description of salient, near-orthogonal odor qualities, as well as a principled means for identifying and rating odor quality. Finally, our study has identified perceptual clusters that may help elucidate a structurepercept mapping. ð Þ Image of original data (left) and NMF-derived approximations WH for subspaces of 5 (center) and 10 (right). Same range and color scale for all images. Because the data matrix contains many small and zero-valued entries among sparse, largevalued entries, the colorscale has been gamma-transformed (2c~1:8) for better visualization and comparisons. Arrowheads indicate columns shown in more detail in panel below. 2B

Supporting Information
ð Þ Detailed representation of columns 70-74 of original data matrix A (black traces) and NMF approximations to those columns by WH for a 10 dimensional subspace (red traces).