The role of fragrance and self-esteem in perception of body odors and impressions of others

Human sweat odor serves as social communication signal for a person’s traits and emotional states. This study explored whether body odors can also communicate information about one’s self-esteem, and the role of applied fragrance in this relationship. Female participants were asked to rate self-esteem and attractiveness of different male contestants of a dating show, while being exposed to male participant’s body odors differing in self-esteem. High self-esteem sweat was rated more pleasant and less intense than low self-esteem sweat. However, there was no difference in perceived self-esteem and attractiveness of male contestants in videos, hence explicit differences in body odor did not transfer to judgments of related person characteristics. When the body odor was fragranced using a fragranced body spray, male contestants were rated as having higher self-esteem and being more attractive. The finding that body odors from male participants differing in self-esteem are rated differently and can be discriminated suggests self-esteem has distinct perceivable olfactory features, but the remaining findings imply that only fragrance affect the psychological impression someone makes. These findings are discussed in the context of the role of body odor and fragrance in human perception and social communication.

Sweat pads were pooled together in white opaque plastic containers with a volume of 250ml and a 50mm diameter opening, from four different senders having the same confidence status and participating in the same fragrance condition. To maximize relevant variance, and to minimize between subject/error variance, fragrance vs. no fragrance condition stimuli were composed from the same donors, e.g., if sweat from high self-esteem donors 1, 4, 8 and 12 in the no fragrance condition was used to compose the high self-esteem, no fragrance stimulus, the sweat from these same donors in the fragrance condition was then also used to make the high self-esteem, fragrance stimulus (as in De Groot et al., 2015;2018). A Latin square design to compose the four different types of stimuli (high self-esteem, no fragrance; high self-esteem, fragrance; low self-esteem, no fragrance; low self-esteem, fragrance) ensured full counterbalancing of which senders contributed to which perceiver stimuli.

Control measures.
Several control measures were taken on the sweat donation days. Room temperature and humidity were recorded on two occasions during the donation day. Pads were weighed before and after the sweat session. These measures were analysed to check whether temperature and humidity were not deviating, but were found not to be different between conditions or between groups. Pad weights were analysed, but no significant differences between fragrance conditions or between groups were found (p > .05). Body spray can weight was also measured, but again no differences between the groups were found (p > .05).

Design & statistical analyses
The research followed a 2 by 2 mixed design. Since we include participants either scoring high or low on self-esteem, the between participant factor is Group (low vs. high self-esteem sweat donors).
Participants participate in two conditions, one where fragrance was applied and one without fragrance application. The use of Fragrance (yes vs. no) is the within participant factor.
All dependent variables were subjected to mixed ANOVAs, with Group (low vs. high self-esteem sweat donors) as between participant factor, and Fragrance (no vs. yes) as within participant factor.
Outliers were replaced using a Median Absolute Deviation (MAD) threshold method (Leys et al., 2013). The usual assumptions for mixed ANOVAs were checked. Since ANOVA is robust against slight deviations from normality, only very severe deviations from normality were resolved using appropriate transformations. Pairwise comparisons to follow up ANOVA tests were Bonferroni corrected.
The same picture emerged when looking at the surname initial liking. There was no effect of Group compared to when smelling sweat from high self-esteem men (M = 47.6, SE = 1.7), p = .016.
This effect was not present for women who did not use hormonal contraceptives, p = .404.
None of the other interaction effects were significant.
For attractiveness ratings, there was a significant main effect of Fragrance, F (1, 60) = 7.949, p = .007, ηp 2 = .117. There was no effect of Donor, F (1, 60) = .181, p = .672, ηp 2 = .003, and For the ratings of pleasantness, there was a main effect for Donor: F(1, 60) = 13.232, p < .001, ηp 2 = .181), indicating that sweat from men with high self-esteem was rated as being more pleasant, independent of fragrance application. There was also a strong main effect for Finally, for the discrimination task results, there were a total of 49 test instances for women not using hormonal contraceptives. Thirty-two out of these 49 test instances were correct (65%), significantly deviating from 50%, t(48) = 2.23, p = .031. For women using hormonal contraceptives, there were 62 test instances, of which 36 were correct (58%), which was no longer significantly above chance, t(62) = 1.28, p = .207. This may suggest that hormonal contraceptives decrease a woman's ability to discriminate sweat odors from men differing on self-esteem, although the difference is very small and should be interpreted with caution.
Overall, there were effects for the use of hormonal contraceptives. These were mainly present on the discrimination task and ratings of attractiveness and confidence, i.e., had an effect on how men were judged depending on what type of odor stimulus was perceived. The effects are, however, not straightforward to interpret, and sometimes contradictive.

S5: methods and results of the pilot test to select video stimuli
The following describes the methods and procedure of the pilot-test to select videos of men to be used in the main experiment reported in Croijmans, Beetsma, Gortemaker, Aarts & Smeets: The main objective was to select a set of 16 YouTube videos portraying men in a dating context, with low to average self-esteem relative to the entire set, as rated by women. The following questions were answered with this pilot test: -What set of 16 videos portray men that are rated to be having low to medium selfesteem, relative to the entire set of 30 videos?
-For appearance ratings of men, is there a difference between videos with sound or without sound?

Participants
A total of 39 participants (all women) completed the survey. Participants were recruited via Amazon Mechanical Turk (MTurk), and were paid $6.67 (€6) for their participation.
Participants were on average 39.8 years old (SD = 11.4). For the purpose of this pilot-test, participants were recruited from a general population, leading to a somewhat older sample than what was used in the actual experiment.
All participants were native speakers of English, except for two. Of these was a native speaker of Russian but spoke English on a daily basis, and one were native in Telugu. All participants were reported to be fluent in English.
participants reported to be married, eight were single, seven were in a relationship, three were widowed and one reported to be engaged. Participants answered a question about their sexual orientation: two participants reported to be bisexual, whereas the remaining 37 participants reported to be heterosexual. Participants reported their highest finished or current education.
Fifteen reported 'bachelor's degree in college (4 years)'; eight reported 'associate degree in college (2 years)'; seven reported 'high school graduate'; five reported 'Master's degree'; three reported 'some college'; and one reported to have or currently be enrolled in a doctorate degree.
Since previous work showed own self-esteem can influence how others are rated on their selfesteem (e.g. Brown, 1986), participants completed the Rosenberg Self-Esteem Scale (RSES; Rosenberg, 1965) to measure their self-esteem. Average score was 22.9 (SD = 5.5), which is comparable to the average score found on the RSES in large scale normative US data samples

Instruments & procedure:
Participants first gave their consent using a standardized information form. Then they answered the demographic questions (birth year, highest degree obtained, sexual orientation, relationship status, native language and spoken language).

Participants watched 30 sections of scenes from different versions of the television program
First Dates. Scene sections were on average 29.6 seconds (SD = 13.4s) long. This length was selected since videos were required to be short enough to play four different videos for each condition in the main experiment, within the maximum length of each condition of 5 minutes (since it is expected that the cue effect would dissipate in around 5 minutes). Videos were always played without video controls, with set start and end times, in a frame with 480p resolution, with a width of 560 pixels and height of 315 pixels. See  Participants then rated the male person in the video on 8 characteristics that were theoretically highly related to the construct of self-esteem. Participants were instructed they would watch short scenes of a dating show, and that they should pay particular attention to the male. Questions were phrased as follows: "Based on your impression, how do the following characteristics apply to the male person in the video? He seems:". Characteristics were: self-confident; nervous; attractive; kind; outgoing; high on self-esteem; dominant; reliable. These characteristics were answered on a 100 point slider scale, ranging 'not at all' to 'very much'. Participants then answered the question 'I would go on a date with him' on a 100 point slider scale ranging 'completely disagree' to 'completely agree'. Participants then answered a question whether they had seen this particular section of the program before.
Videos were presented in a random order. See Figure S5.1 for a screenshot of the question formatting. After watching the videos, participants answered how entertaining they found the task, rated 3 statements about the task and were given the opportunity to give any further comments about the task using an open question form.
Participant attentiveness and reliability was checked in two ways: participants answered 4 questions at random locations during the survey about specific aspects of the male characters (e.g. what was the color of the shirt that the guy was wearing?), to check for their attentiveness. There was little evidence participants were inattentive to the videos (i.e., all answered at least 2 questions correctly).
A reliability analysis was also undertaken (see table S5.2), to check answer consistency between the participants, for each of the 9 questions. McDonald's omega was excellent (all >.9) for all questions, and the 'omega if item (i.e., participant) dropped' were all minimal (all <.05), indicating none of the participant's answer patterns markedly differed from the rest.

Data analysis
Means on all 9 ratings are compared between sound and muted condition to see whether sound makes a difference, using a mixed ANOVA with rating (9 levels) as within participant factor, and sound condition (2 levels) as between participant factor. Assumptions were checked and corrected for. Pairwise follow-up tests are used in case of significant interactions. In case there is a difference on the ratings of self-confidence, high on self-esteem and outgoing, the condition with the lowest ratings are selected. In case there is no difference, muted videos are selected following Roberts et al. (2009). Aim was to find videos of men scoring average on these traits, since it is expected that this leaves room for improvement using fragrance, or can be influenced by presenting sweat from men with high self-esteem.
Correlation analysis is performed to see whether the ratings correlate, and can possibly be aggregated. In addition, all ratings were correlated with participant self-esteem (measured using the Rosenberg Self-Esteem Scale) to see potential influences of rater self-esteem on the ratings of others (Brown, 1986). After this, videos are ranked on these three variables, and 16 videos on the lower end of these ratings on these three variables (self-confidence, high on self-esteem and outgoing) are selected for the receiver task.
Note that there were 30 videos, and 39 participants. Four ratings were missing (3 in the muted, and 1 in the sound condition), amounting to DF=1166.

Results
A significant main effect of rating was not interpreted since this is out of the focus of the current research, whereas the main effect of sound was not significant, F (1, 1164) = .06, p = .810, suggesting that generally, there was no difference between the videos rated with sound or without sound.
Additionally, the mixed ANOVA showed a significant interaction between rating and sound condition, F (8, 9312) = 11.81, p = .001. Bonferroni corrected pairwise comparisons are presented in table S5.2. Importantly, for the ratings of self-confident, self-esteem and outgoing, sound did not matter.
Next, the correlation analysis was done (see Table S5.3). Self-confidence and self-esteem correlated highly (r = .805 > .8), indicating considerable overlap in these constructs, as expected. Self-confidence and extraversion (outgoing) also correlated highly (r = .669), yet less than .8. In turn, extraversion and self-esteem also correlated highly (r = .737) but also less than .8. This means these constructs give additional information, and argue against the use of a single one of these to select the videos, but merit the use of an aggregate score of these three factors.
Correlations of all ratings with participant's own RSES score showed no significant strong correlations, although it seems own self-esteem negatively influences judgments of whether someone is rated friendly (kind) (r = -.114, p < .001) and whether someone is considered datable (r = -.130, p < .001).
Since these constructs are found to be closely related, ratings on self-confidence, extraversion (outgoing) and self-esteem were summed and averaged over participants in both conditions (muted and soundsince this was not found to be of influence), and videos were rankordered based on the averaged ratings (see Table S5.4). Men in videos were overall rated to have average to high self-esteem, and therefore, the 16 videos scoring around the midpoint of the 0-100 point scale (i.e., around 50, ranging 43-64) were selected. This was done since ratings on these videos are expected to be influenced most by contextual factors such as a chemical cue in sweat or perfume, as they leave most room for improvement by these factors.
Two videos were from the same First Dates episode, starring the same man. To have no  . This latter rating may be taken as an indication that the muted videos contain somewhat more ambiguity, and that participants focused on other contextual cues to base their rating on, which is both desired for a task in which the odor is supposed to give context. The final open question suggested that almost all women found the task interesting and entertaining to do, and further commented to find the show entertaining.

Conclusions & recommendations for main study
The results show there is enough variation in the ratings of the men in the video material to be useable. Since the recommended 16 videos leave room for improvement, since they vary around the scale midpoint, it is recommended to use these 16 videos in the receiver task.
There was no difference in ratings of self-confidence, self-esteem, extraversion, and, also of interest, attractiveness, between the sound and muted condition. Muted videos offer the most ambiguous context, leaving room for the influence of smell. In addition, previous work Table S5.4: ranked ratings per video section. Videos printed in bold are scenes from the same Youtube episode. Video sections marked grey are selected for the receiver task.

Rank Video section
Youtube video Mean rating muted, n = 20 (SD) Mean rating with sound, n = 19 (SD) (Dalton et al., 2013;Roberts et al., 2009) also used muted videos in similar tasks. Therefore, it is recommended to use the muted videos.
It is also recommended to use two videos for training, to make participants familiar with the task and the different questions. To implicitly anchor participants, it is recommended to use the two unique videos that score the highest and lowest on this ranking, i.e., video section 19 and 17.