^{1}

^{2}

^{*}

^{3}

^{4}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: JBC AR CSC. Performed the experiments: JBC AR CSC. Analyzed the data: JBC AR CSC. Wrote the paper: JBC AR CSC.

In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF) – a dimensionality reduction technique – to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures.

Our understanding of a sensory modality is marked, in part, by our ability to explain its characteristic perceptual qualities

Early efforts to systematically characterize odor space focused on identifying small numbers of perceptual primaries, which, when taken as a set, were hypothesized to span the full range of possible olfactory experiences

Here, we were interested in explicitly retaining additional degrees of freedom to describe olfactory percepts. Motivated by studies suggesting the existence of discrete perceptual clusters in olfaction

Applying NMF, we derive a 10-dimensional representation of odor perceptual space, with each dimension characterized by only a handful of positive valued semantic descriptors. Odor profiles tended to be categorically defined by their membership in a single one of these dimensions, which readily allowed co-clustering of odor features and odors. While the analysis of larger odor profile databases will be needed to generalize these results, the techniques described herein provide a conceptual and quantitative framework for investigating the potential mapping between chemicals and their corresponding odor percepts.

Non-negative matrix factorization (NMF) is a technique proposed for deriving low-rank approximations of the kind

To derive

assume

set negative elements of

assume

set negative elements of

We used the standard implementation of non-negative factorization algorithm ( nnmf.m) in Matlab (Mathworks, Inc.). Given the size of the odor profile matrix (

Note that a minimum solution obtained by matrices

The choice of sub-space dimension

We applied NMF to scrambled perceptual data, that is elements of A are scrambled (randomly reorganized) before analyzing with NMF. Three different scrambling procedure were implemented. First was odorant shuffling where the column values of A are randomly permuted in each row. The second was descriptor shuffling where the row values of matrix A are randomly permuted in each column. Finally, we scrambled the elements of the entire matrix, that is indiscriminate shuffling of both descriptors and odorants entries.

We tested the stability of the NMF results on the original and scrambled versions of the perceptual data using a consensus clustering algorithm proposed in

We first initiated a zero-valued connectivity matrix

We then evaluated the stability of the clustering induced by a given sub-space dimension

We plot

We use a variant of stochastic neighbor embedding method

We analyzed the published data set of Dravnieks

NMF seeks a low-rank approximation of a matrix

Notably, for subspaces 1–25 – a regime in which training error decreases continuously – the testing error decreases, attains a minimum, and then begins to increase. Thus, while a

Plot of normalized odor descriptor amplitude vs. odor descriptor number for the basis vector

To more quantitatively motivate the choice of subspace size, we applied two techniques commonly used in problems of NMF model selection

As a second means for quantifying the intrinsic dimensionality of the Dravnieks data set, we calculated the cophenetic correlation coefficient

The results of our cophenetic correlation analysis are shown in supplementary

Given that analysis of reconstruction error (

An immediate consequence of the non-negativity constraint is sparseness of the basis vectors. As seen in

W1 | W2 | W3 | W4 | W5 | W6 | W7 | W8 | W9 | W10 |

FRAGRANT | WOODY, RESINOUS | FRUITY, OTHER THAN CITRUS | SICKENING | CHEMICAL | MINTY, PEPPERMINT | SWEET | POPCORN | SICKENING | LEMON |

FLORAL | MUSTY, EARTHY, MOLDY | SWEET | PUTRID, FOUL, DECAYED | ETHERISH, ANAESTHETIC | COOL, COOLING | VANILLA | BURNT, SMOKY | GARLIC, ONION | FRUITY, CITRUS |

PERFUMERY | CEDARWOOD | FRAGRANT | RANCID | MEDICINAL | AROMATIC | FRAGRANT | PEANUT BUTTER | HEAVY | FRAGRANT |

SWEET | HERBAL, GREEN, CUT GRASS | AROMATIC | SWEATY | DISINFECTANT, CARBOLIC | ANISE (LICORICE) | AROMATIC | NUTTY (WALNUT ETC) | BURNT, SMOKY | ORANGE |

ROSE | FRAGRANT | LIGHT | SOUR, VINEGAR | SHARP, PUNGENT, ACID | FRAGRANT | CHOCOLATE | OILY, FATTY | SULFIDIC | LIGHT |

AROMATIC | AROMATIC | PINEAPPLE | SHARP, PUNGENT, ACID | GASOLINE, SOLVENT | MEDICINAL | MALTY | ALMOND | SHARP, PUNGENT, ACID | SWEET |

LIGHT | LIGHT | CHERRY (BERRY) | FECAL (LIKE MANURE) | PAINT | SPICY | ALMOND | HEAVY | HOUSEHOLD GAS | COOL, COOLING |

COLOGNE | HEAVY | STRAWBERRY | SOUR MILK | CLEANING FLUID | SWEET | CARAMEL | WARM | PUTRID, FOUL, DECAYED | AROMATIC |

HERBAL, GREEN, CUT GRASS | SPICY | PERFUMERY | MUSTY, EARTHY, MOLDY | ALCOHOLIC | EUCALIPTUS | LIGHT | MUSTY, EARTHY, MOLDY | SEWER | HERBAL, GREEN, CUT GRASS |

VIOLETS | BURNT, SMOKY | BANANA | HEAVY | TURPENTINE (PINE OIL) | CAMPHOR | WARM | WOODY, RESINOUS | BURNT RUBBER | SHARP, PUNGENT, ACID |

To ensure that the sparse basis vectors we obtained were not an artifact of the NMF procedure, but rather depended on correlations in the data, we repeated the calculation of W for three shuffled versions of the profiling data (

In histograms of basis vectors obtained from the full-shuffled and descriptor-shuffled data (

While these first several NMF dimensions (

We next asked how the 144 individual odor profiles (that is, columns of

To investigate these and other possibilities, we first examined the structure of

Intriguingly, this procedure revealed a prominent block diagonal structure to the full matrix

These two properties can be alternatively visualized when odors (columns of

As a final means for investigating whether odorants are smoothly vs. discretely arranged in descriptor space, we constructed two-dimensional embeddings for the matrices

Results of stochastic neighbor embedding (see text) applied to the similarity matrix for

Results of stochastic neighbor embedding (see text) applied to the similarity matrix for

The perceptual space,

To explore this potential fine-scale structure wherein subsets of odorants show distinct correlations among subsets of descriptors, we sought submatrices of

The clear upper-left organization of these submatrices illustrates that there are sets of odors to which distinct odor descriptors apply. Members of all clusters, as defined by their peak coordinate in the new 10 dimensional descriptor space, are given in

Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 |

1. Isoamylphenylacetate,2. Aurantiol,3. 6,7-dihydro-1,1,2,3,3-pentamethyl-4-(5H)indanone,4. Indol-hydroxycitronellal,5. beta-ionone (low concentration),6. beta-ionone (high concentration),7. N'-[(E)-3-(5-methoxy-2,3-dihydro-1,4-benzodioxin-7-yl) prop-2-enoyl]-2,3-dihydro-1,4-benzodioxine-3-carbohydrazide,8. hydroxyisohexyl 3-cyclohexene carboxaldehyde,9. 2-methoxynaphthalene,10. Diethoxymethane,11. Galaxolide,12. ethylenebrassylate,13. Phenylethyl Alcohol (low concentration)14. Phenylethyl Alcohol (high concentration) | 15. Cedrene epoxide,16. bornyl acetate,17. 8-sec-Butylquinoline,18. 2,4,6-trimethylcyclohex-3-ene-1-carbaldehyde,19. decalin,20. dibutylamine,21. Synthetic amber,22. 1,1-Dimethoxy-2-phenylpropane,23. Methyl isonicotinate,24. Nootkatone,25. 1-octen-3-ol,26. isophorone (low concentration),27. isophorone (high concentration),28. Isopropyl quinolone,29. Argeol,30. Gamma-undecalactone,31. 10-undecenoic acid | 32. ethylmethylphenylglycidate (low concentration)33. ethylmethylphenylglycidate (high concentration)34. allylcaproate,35. isoamyl acetate,36. n-amyl butyrate,37. Dmbc butyrate,38. ethyl butyrate,39. ethyl propionate,40. Fructone,41. methylanthranilate,42. Pentylvalerate | 43. Butyric Acid44. hexanoic acid45. indole46. methylthiolbutyrate47. n-pentanoic acid48. 4-pentenoic acid49.50. . phenylacetic acid51. Propyl butyrate52. Skatole (3-Methyl-1H-indole)53. Isovalerylaldehyde54. isovaleric acid | 55. Acetophenone56. Anisole57. 1-Butanol58. 4-cresol59. p-Tolylisobutyrate60. 4-methyl anisole61. cyclohexanol62. 2,5-dimethylpyrazine63. methyl hexyl ether64. 1-hexanol65. 3-hexanol66. iodoform67. methyl furan-3-carboxylate68. 4-methylquinoline69. phenylacetylene70. alpha-terpineol71. 6-methyl-1,2,3,4-tetrahydroquinoline72. Thymol73. Toluene74. 3-Methyl-1H-indole |

Cluster 6 | Cluster 7 | Cluster 8 | Cluster 9 | Cluster 10 |

1. Anethole2. 8-sec-Butylquinoline3. carvone4. caryophyllene5. 4-cresyl acetate6. eucalyptol7. Eugenol8. Menthol9. methyl salicylate10. Safrole | 11. Abhexon12. Gamma-nonalactone13. Benzaldehyde14. 3,4-dihydrocoumarin15. 3-Propylidene phthalide16. cinnamic aldehyde17. coumarin18. cyclotene19. Furaldehyde20. 2-hexenal21. 2-methylbenzaldehyde22. gamma-valerolactone | 23. Vanillin24. 2-acetylpyridine25. 2,4-decadienal26. Pyrazine27. methyl hexyl ether28. 2,5-dimethylpyrrole29. Ethylpyrazine30. Ethylpyrazine31. Heptanal32. n-hexanal33. 1-Octanol34. 2-methyl-5,7-dihydrothieno[3,4-d]pyrimidine | 35. Zingherone36. dibutyl sulfide37. Chlorothymol38. 2-Mercaptopropanone39. 1,2-cyclohexanedione40. diethyl sulfide41. dimethyltrisulfide42. furfurylmercaptan43. Guaiacol44. Hexylamine45. Hexylamine46. AC1L18DS | 47. polythiophene48. Adoxal49. Amyl cinnamic aldehyde diethyl acetal50. Citral51. Geranonitrile52. Cuminaldehyde53. 4-Methyl-2-(1-phenylethyl)-1,3-dioxolan54. 2-Methyl-4-phenylbutan-2-ol55. phenyl ether56. Floralozone57. Heptanol58. hexylcinnamic aldehyde59. hydroxycitronellal60. linalool61. limonene62. Melonal63. Myrac aldehyde64. n-Nonyl acetate |

We have applied non-negative matrix factorization (NMF) to odor profiling data to derive a 10-dimensional descriptor space for human odor percepts. For the data set investigated, individual odor profiles are well-classified by their proximity to a single one of these dimensions, with all 10 dimensions being approximately equally expressed across the set of odors. This is consistent with the notion that olfactory space is high-dimensional

The perceptual dimensions obtained from NMF identify descriptors that are salient in several previous analyses of odor space

While several of these same principal qualities have been identified before, NMF describes a notably different representation of the space in which they reside. Specifically, NMF leads to a description of odor space defined by dimensions that apply categorically. By contrast, odors in PCA space are more diffusely distributed across dimensions. Moreover, odors in PCA space (as well as spaces derived from multidimensional scaling and factor analysis) tend to be smoothly distributed in subspaces that span multiple axes, though heirarchical applications of PCA have identified several quality-specific clusters

Intuitively, the non-negativity constraint produces NMF basis vectors defined by subsets of descriptors that are weighted and co-applied in particularly informative combinations, defining dimensions that range from absence to presence of a positive quantity. This contrasts to basis vectors and dimensions derived from other techniques, which extend from one quality to that quality's presumed opposite. Such dimensions have intuitive interpretations in some cases, for example, the experimentally supported ‘pleasantness’ dimension corresponding to principal component 1 (PC1), which ranges from ‘fragrant’ to ‘sickening’. Interestingly, constraining the NMF subspace to 2 shows that most odors fall homogeneously along a continuum reminiscent of the first principal component (

It may be possible to observe physiological properties of odor representations indicative of one kind of representation vs. another. If the underlying perceptual dimensions of odor space are categorical, one would expect relative similarity between odor representations for odors occupying the same putative perceptual dimension. Similarly, one would expect abrupt, state-like transitions in neural representations of slowly morphing binary mixture stimuli whose component odors nominally ‘belong’ to different perceptual dimensions. Consistent with these criteria, a recent study has shown discrete transitions in the ensemble activity of the zebrafish olfactory bulb during such odor morphs (

Our study has some limitations that should be noted. Chief among these is the small size of the odor profiling data set used relative to the much larger set of possible odors, which may limit the generality of our findings. In future studies, it will be necessary to extend the NMF framework to larger sets of odors than the 144 investigated presently, such that a more complete and representative sample from odor space is obtained. Another limitation pertains to the ‘subjective’ nature of odor profiling data. While profiles are quantitative in the sense that they are stable and reliable across raters

In summary, we have shown that olfactory perceptual space can be spanned by a set of near-orthogonal axes that each represent a single, positive-valued odor quality. Odors cluster predominantly along these axes, motivating the interpretation that odor space is organized by a relatively large number of independent qualities that apply categorically. Independently of whether our description of odor space identifies innate or ‘natural’ axes determined by receptor specificities, it provides a compact description of salient, near-orthogonal odor qualities, as well as a principled means for identifying and rating odor quality. Finally, our study has identified perceptual clusters that may help elucidate a structure-percept mapping.

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

We thank Dr. Alexei Koulakov for kindly providing an electronic copy of the Dravnieks odor database, and Dr. Nathan Urban for initial help on the project. We thank Drs. Rick Gerkin, and Krishnan Padmanabhan for helpful feedback on an earlier manuscript.