Spontaneous Voice Gender Imitation Abilities in Adult Speakers

Background The frequency components of the human voice play a major role in signalling the gender of the speaker. A voice imitation study was conducted to investigate individuals' ability to make behavioural adjustments to fundamental frequency (F0), and formants (Fi) in order to manipulate their expression of voice gender. Methodology/Principal Findings Thirty-two native British-English adult speakers were asked to read out loud different types of text (words, sentence, passage) using their normal voice and then while sounding as ‘masculine’ and ‘feminine’ as possible. Overall, the results show that both men and women raised their F0 and Fi when feminising their voice, and lowered their F0 and Fi when masculinising their voice. Conclusions/Significance These observations suggest that adult speakers are capable of spontaneous glottal and vocal tract length adjustments to express masculinity and femininity in their voice. These results point to a “gender code”, where speakers make a conventionalized use of the existing sex dimorphism to vary the expression of their gender and gender-related attributes.


Introduction
The human voice is highly sexually dimorphic. Alongside other properties that distinguish male from female voices, such as intonation [1], duration [2,3] and speech rate [4,5], the main cues to speaker gender are fundamental frequency (F0 -or its perceptual correlate ''pitch'') and formant frequencies (Fi -mainly responsible for the perception of ''timbre''), which together account for 98.8% of the perceived voice dimorphism [6].
These differences stem from the testosterone-driven enlargement of the larynx and the increase in the length of the vocal tract that accompany male puberty [7]. During this time, the male larynx outgrows the female larynx by 40% [7], increasing vocal fold length by 60% on average (reaching 16 mm in adult males, and 10 mm in adult females [8]). As F0 is based on the rate of vocal fold vibration, which in turn is inversely proportional to the square root of the vocal fold tissue length, men's F0 (about 120 Hz) becomes on average 80 Hz lower than women's (about 200 Hz) [7] giving male speakers their characteristically lowerpitched voice. Between-sex differences in formant frequencies are related to differential body growth, with adult men being 7% taller than women on average [9] and to the male-specific second descent of the larynx, which together contribute to men's vocal tract being on average 18 cm, compared to women's 15 cm [10]. Because formant frequencies are negatively correlated with the length of the vocal tract [11], male speakers produce lower Fi values and therefore a formant spacing (DF) that is about 15%-20% narrower than in female speakers [12,13], which results in male voices having a more ''baritone'' timbre [14].
Variation in gender expression, however, cannot be entirely determined by these hormonal and size-related sex differences in the vocal apparatus. For example, acoustic analyses [15][16][17][18][19] of pre-pubertal children's voices consistently show that boys speak with lower formants than girls, while perceptual studies [18] show that children's voice gender can be identified in children as young as 4 years old, despite the fact that the anatomy of the vocal apparatus does not significantly differ between the two sexes until the pubertal age [14,20]. These observations suggest that children acquire (consciously or unconsciously) genderspecific articulatory behaviours during development, and that speakers develop a knowledge of how a ''male'' or a ''female'' should sound, with male voices being low-pitched and ''deeper'', while female voices being high-pitched and ''lighter''. These differences in formant frequencies also suggest a possible role for lip protrusion (or spreading) and larynx lowering (or raising) in vocal tract length adjustments during speech, as possible articulatory gestures used by speakers in order to masculinise or feminise their voices. Thus, on top of the static, biohormonally determined differences, our voice contains dynamic and behaviourally controlled acoustic cues (in particular F0 and formants) for the expression of gender and gender-related attributes. However, the nature and the extent of their role have not yet been systematically investigated.

Hypotheses
The current study explores the ability of adult speakers to alter the femininity and masculinity of their voices during an imitation experiment, as well as the extent to which they are aware of the nature of the underlying articulatory gestures that they use to make these alterations. We predict that both male and female speakers will lower their mean F0, reduce its variation, and lower their Fi, thus narrowing DF, when trying to sound as ''masculine'' as possible, whilst they will increase their mean F0 and its variation, as well as raise Fi, thus widening DF, to sound as ''feminine'' as possible. In addition, we hypothesise that speakers will round their lips in order to lengthen their vocal tract when masculinising their voice, and spread their lips to shorten their tract when feminising their voice. We also investigate male and female speakers' awareness of the contribution of F0, formant shifts and related articulatory gestures (lip/laryngeal movements) to the vocal exaggeration of masculinity and femininity.

Subjects
Participants were 15 female and 17 male undergraduate students from the University of Sussex (UK), between 18 and 45 years of age (M = 22.56, SD = 6.4) with no self-reported history of speech, language, or hearing disorders. All were native speakers of British English. Informed written consent was obtained for all participants before study entry.

Procedure
Voice data were collected from individual speakers in a soundattenuated booth at the University of Sussex. Participants were seated in a comfortable chair wearing a hat fixed to the chair in order to limit head movement, and were audio recorded with a high-fidelity microphone (AKG Perception 220).
Each participant was asked to read three different types of written stimuli out loud, first using their normal speaking voice (neutral condition), then sounding as 'feminine' as possible (feminine condition) and then as 'masculine' as possible (masculine condition), in alternate order. The material included a list of vowels embedded in a CVC context (vowel task), one short sentence that included many of the vowel sounds present in the vowel task (sentence task), and a 168 word passage comprised of several sentences (passage task - [21]). The order of presentation of the CVC words was randomized across participants to avoid serial order effects. Participants were allowed to progress at their own pace, choosing to continue to the next word only when ready. The word and sentence sequences were shown on a computer monitor, using a script written in PsyScope X Build 57. The text extract was shown in Microsoft Word 2007.
Participant's height and weight were measured prior to collecting the speech sample (Table 1). Height measurements were recorded to the nearest 0.1 cm, using a freestanding Seca Leicester stadiometer. Participants took their shoes off and stood with their shoulders flush to the stick and their heads level and oriented forward. Body weight was measured to the nearest 0.1 kg using a PS250 veterinary floor scale. Means, standard deviations and range values for participants' body size measurements are reported in Table 1.
After completion of the vocal task, the experimenter went over a questionnaire with participants about the strategies they used to masculinise and feminise their voices, and recorded their responses on paper. The questionnaire began with a series of open questions, followed by multiple-choice questions on several vocal and articulatory gestures.

Visual Measurements
For each participant, we measured lip spreading (LS), the horizontal distance between the two mouth corners, and openness (LO), the vertical distance between the centres of the upper and lower lips. In order to take these measurements, the horizontal mouth corners and the upper and lower centre lips were marked using a black makeup pencil (horizontal lines for the upper and lower lips, vertical lines for the mouth corners). The lip ratio for each participant was also calculated as the ratio between their lip spreading and openness. Video recordings of the participants were taken using a Sony HDR-TG3E handycam. The visual measurements were taken from stills captured using Apple iMovie version 8.0.6 of the vowel task in the neutral condition just after the participant had uttered the first consonant. Markers were then used to extract the horizontal (lip spreading) and vertical (lip openness) mouth distances using the line drawing function in Adobe Illustrator CS5.

Acoustic Measurements
The stimuli consisted of nine monophthong British vowels in / CVC/sequences (had /ae/, head /e/, hud / /, heed /i:/, hid / /, heard / :/, hod / /, hood / /, who'd /u/), the sentence ''where were you a year ago?'' and an extract from the ''Rainbow Passage'' [21]. A custom script was written in PRAAT v.5.0.3 [22] to process the collected audio samples. The script assigned a random identifier to each sample in order to ensure blind analysis. It then allowed the experimenter to set the analysis parameters and to visually compare the fundamental and formants frequencies against a narrowband spectrogram. The analysis parameters were adjusted when the computed values departed from the visually estimated fundamental and formant frequencies.
Fundamental Frequency. For the F0 analysis, the script used the PRAAT autocorrelation algorithm ''to Pitch (ac)'', which estimates the F0 contour, from which the script derived mean F0 (F0mean), F0 standard deviation (F0SD) and the coefficient of variation (F0CV). F0CV, which is given by F0SD/F0mean, provides a measure of the magnitude of F0 variation relative to the mean, which reflects the logarithmic perception of pitch and therefore is a better estimate of F0 variation than its absolute estimate given by F0SD [17]. Perceptually, a voice with lower F0CV has a more monotone quality than a voice with higher F0CV. The parameters for F0 analysis were set as: pitch floor 30 Hz and ceiling 500 Hz for male speakers, 60 Hz and 500 Hz for female speakers, time step 0.01 s.
Formant Frequencies. For formant (Fi) analysis, the script used PRAAT's Linear Predictive Coding ''Burg'' algorithm in order to estimate the formant centre frequencies for the first four formants (F1-F4). The parameters for formant analysis were set as: number of formants 5, max formant 5000 Hz for male speakers and 5500 Hz for female speakers, and dynamic range 30 dB. The length of the analysis window was 0.025 s in the vowel and sentence tasks, and 0.5 s in the passage task. Formant spacing. The centre frequencies for F1-F4 of each sample were used to calculate its average formant spacing (DF), which is the distance between any two adjacent formants: DF was calculated by forcing the observed Fi values to fit the vocal tract model described in the source-filter theory [11]. In this model, the vocal tract has a uniform cross-sectional area along its entire length, which approximates the production of the vowel ''schwa'' (/ /). Thus, the vocal tract acts as a quarter-wave resonator, closed at the glottis and open at the mouth, and the vocal tract resonances are given by: where Fi is the i th -formant, c is the speed of sound in the human vocal tract (approximated to 35000 m/s) and VTL is the length of the resonator. From (1) and (2), it follows that individual formants are related to DF by: DF can therefore be calculated as the slope of the linear regression expressed in equation (3), by plotting the observed Fi (y-axis) against the expected 2i21/2 formant positions (x-axis), and with the intercept set to 0 [23]. Whilst the specific variation of formants in vowels other than the ''schwa'' requires more complex models than the uniform quarter wavelength resonator used here [24], the average distribution of formants at suprasegmental level approaches a constant that corresponds to the DF predicted by such a model [7]. The adequacy of this method is illustrated by estimations of DF based on published acoustic data [17] presented in Figure S1. It is also consistent with perceptual observations: Smith and Patterson [25] report that DF differences re-synthesised via linear compression/ expansion of the vowel spectral envelope correlate strongly with listeners' cross-class judgments of speaker's age, sex and size (man, woman, boy, girl). More recently, Pisanski and Rendall [26] also found that small (12% or 18%) uniform increments in Fi negatively correlate not only with the perceived size, but also with the masculinity of speakers within the same sex and age group.

Statistical Analyses
Two-way mixed ANOVAs were used to investigate the overall effect of sex (group factor) and condition (as a three-level repeated factor: neutral, masculine, feminine) on each of the acoustic parameters F0mean, F0CV, Fi and DF, and on the visual parameters LS, LO and lip ratio. We also tested for differences across conditions for male and female speakers separately, running separate one-way repeated ANOVAs within each sex with condition as the factor variable and using contrasts between neutral and masculine, and neutral and feminine conditions. Levene's tests were used to check for equality of variance, and the data were log-transformed when the assumption was violated. A Mauchly's test was applied in order to check sphericity and sphericity violations were corrected for with the Greenhouse-Geisser e. All statistical analyses were run using SPSS v.18.

Results
The results of the ANOVAs performed on the acoustic measures are presented in Table 2 (vowel task), Table 3 (sentence  task) and Table 4 (passage task). The means and standard deviations of the acoustic measures, and the F and p-values of the associated contrast are provided separately for male and female speakers in Tables 5, 6, 7 and 8.

Fundamental Frequency
There was a significant main effect of sex on F0mean in all three reading tasks, indicating that male speakers had a lower mean F0 than female speakers across conditions, in line with the wellestablished sexual dimorphism in mean F0 between the two sexes.
There was also a significant main effect of condition on F0 across the three tasks. Separate ANOVAs revealed that both male and female speakers significantly raised their F0 when feminizing their voice and dropped their F0 when masculinising their voice (except when reading the passage, where the difference between   (Table 5). The smallest, yet significant, drop was recorded in reading the passage, 0.6% for men (Table 6) and 2.3% for women (Table 5). Both male and female speakers also significantly raised their F0 when feminising their voices. The largest change in F0 between speakers' natural and feminised voice occurred when reading the sentence, with male speakers raising their F0 to 162.2 Hz (about 40% rise - Table 6) and female speakers to 256.7 Hz (about 24% - Table 5), whereas the smallest, yet significant, rise was recorded in reading the passage, 28% for men (Table 6) and 20% for women ( Table 5). The interaction effect between condition and sex was not significant.

Fundamental Frequency variation (F0CV)
The effect of sex on F0CV was not significant for vowels, but was significant in the other two tasks, indicating that, overall, men spoke with a narrower dynamic range than women.
There was also a significant main effect of condition in the sentence and passage, but not for the vowels. Contrasts revealed that male speakers' F0CV was not significantly lower when sounding as masculine as possible than when speaking normally (although a non-significant trend was observed for the passage - Table 8). Female speakers' F0CV was significantly lower in the  masculine condition, but only when reading the passage out loud (Table 7). There was a non-significant trend for male speakers to raise F0CV when reading the passage in a feminised voice (Table 8), while female speakers significantly increased their F0CV to feminise their voice only in the vowel task (Table 7).

Formant frequencies
There was a significant main effect of sex on Fi in all three reading tasks indicating that male speakers' formants were lower than female speakers' across conditions.
There was also a significant main effect of condition on Fi across the three tasks. Contrasts revealed that, when asked to sound as masculine as possible, men lowered all their formants, except for F1 across conditions, F2 and F3 in the sentence task, for which no significant differences were found (Table 8). Female speakers also significantly lowered their formants when sounding as masculine as possible for all three tasks, except for F1 in the sentence task (Table 7).
When asked to sound as feminine as possible, male speakers significantly raised their formants, except for F1 across conditions and F2 in the sentence task (Table 8). Females also showed an overall tendency to raise their formants, although statistical significance was only reached for F4 in the vowel task, and F1, F2 and F4 in the sentence task (Table 7).
Linear mixed models testing for differences in Fi were run separately for each sex as a function of condition and vowel. The results are shown graphically in Figure 1. For both men and women, there were main effects of condition and vowel on each individual formant frequency, while no significant interaction effect between condition and vowel was found on Fi (see Table 9). The vowel spaces (Figure 2) show that the vowels in the neutral condition match the typical vowel distribution in F1/F2 space for both sexes, whilst the vowel spaces in the masculine and feminine conditions match the neutral vowel space in shape, but are smaller and globally shifted downward and left, and bigger and globally shifted upward and right, respectively.

Formant spacing
There was a significant main effect of sex on DF in all the three reading tasks, indicating that male speakers had a narrower overall formant spacing (DF) than female speakers. There was also a significant main effect of condition on DF across the three tasks. The interaction effect between condition and sex was not significant. Contrasts revealed that both male and female speakers significantly narrowed their DF when masculinising their voice (Tables 7 and 8). In male speakers, the extent of this decrease varied from about 2% in the passage to 3% in the other two tasks (Table 6), while in female speakers it varied from about 3% in the passage to 5% in the other two tasks (Table 5). Male speakers also significantly widened their DF when feminising their voice, and the extent of this increase ranged from 3% in the passage to 6% and 5% in the sentence and vowel tasks (Table 6), respectively, while female speakers increased their DF from 1% (passage, vowels) to 3% (sentence), reaching significance only in the sentence task.

Lip measurements
The mean and standard deviations for the lip measurements (in pixels) taken from the vowel task in the neutral condition are presented in Table 10. The main effect of sex was significant on lip spreading (LS), F(1,21) = 8.77, p = .007, with women having a larger LS overall than men. There was also a significant main effect of condition on LS, F(2,42) = 13.86, p,.001. Contrasts revealed that both men and women significantly reduced their LS when trying to sound as masculine as possible, and increased it when sounding as feminine as possible, albeit not significantly. No significant interaction between sex and condition was found, The front vowels /ae/, /i /, / /, showed the highest degree of lip spreading, while lowest degree of lip spreading was recorded for the back vowels / /, / /, /u/. High vowels / /, /u/ also showed the least degree of lip opening, whilst low vowels exhibited the greatest lip opening. The lip ratio was smallest for vowels /ae/, /e/. There were no interaction effects between condition and vowel, and sex and vowel, indicating that both men and women moved their lips in a similar way across all three conditions.

Participants' self-descriptions of vocal and articulatory gestures
Out of 17 male and 15 female speakers, when asked to spontaneously describe the strategies used to masculinise their voices, 9 males and 7 females replied that they made their voices sound deeper, x 2 (32) = .13, p = .723, and 8 males and 4 females said that they made them lower, x 2 (32) = 1.41, p = .234. To feminise their voices, 12 males and 7 females said that they made their voices higher, x 2 (32) = 1.89, p = 1.69, and 5 males and 4 females reported making it softer, x 2 (32) = 0.30, p = .86. When given a choice of possible gestures, most participants reported changes in pitch: all 17 males and 14 females said that they lowered their pitch to sound more masculine, x 2 (32) = 1.17, p = .279, and 16 males and 13 females said they raised their pitch to sound more feminine. The majority of males also reported vocal tract length adjustments: 13 males reported the descent of their Adam's apple as a gesture to masculinise their voice, compared to 6 females, x 2 (32) = 4.39, p = .036. This was the only significant association between sex and type of strategy. Six males also reported moving their Adam's apple up to feminise their voices, compared to 4 females, x 2 (32) = 2.76, p = .599. As for lip movements, 8 males and 11 females said they rounded their lips to sound more masculine, x 2 (32) = 2.28, p = .131, while 8 males and 8 females said they spread their lips to sound more feminine, x 2 (32) = 1.25, p = .723.

Discussion
We found that when untrained adult speakers were asked to sound as masculine or as feminine as possible, they altered the frequency components of their voice (F0 and formant parameters) by adjusting the rate of vibration of their vocal folds and by changing the apparent length of their vocal tract. This shows that adult speakers have some knowledge of the sexually dimorphic acoustic cues underlying the expression of gender in speech, and are capable of controlling them to modulate gender-related attributes. Below we discuss each F0 and formant parameter individually, focusing on their acoustic and perceptual relevance in relation to previous research. Then, we compare the observed manipulations to those used to express size, and, following the ''frequency code'' theory [27], propose that a substantial proportion of gender-related vocal diversity in the human voice follows a ''gender code'', with speakers using learned vocal gestures to manipulate their voice gender. We also look at the interplay between the observed vocal tract adjustments (e.g. lip movements and facial expressions) and the impact on gender expression. Finally, we propose some directions for future research.

Fundamental Frequency
For both sexes, the mean F0 measured in the neutral condition was comparable to previously reported F0 values in British English [28]. The observed sex dimorphism for this parameter (1.8) is in line with previous acoustic observations [29] and can be mostly accounted for by the dimorphism in vocal fold length (1.6 -[7]).  The remaining 20% of dimorphism has been attributed to sex differences in vocal fold physiology [7,26], but may also point to differences in phonation behaviour [29,30].
In both sexes, speakers lowered their F0 when masculinising their voices, and raised their F0 when feminising their voices, although in both conditions F0 remained within the expected  range of their sex (around 100-150 Hz for men, 170-220 Hz for women - [31]). The F0 drop between the neutral and masculine conditions was about three times smaller than the F0 rise from the neutral to the feminine condition, with the smallest and nonsignificant drop being recorded for the passage. This could be a consequence of physiological constraints that make it more difficult for speakers to sustainably lower F0. Indeed, adult speakers speak with a mean F0 at the lower end of their physically attainable range in several languages (Traunmüller H, Eriksson A 1994 -unpublished manuscript), and this is particularly the case of male speakers of British English [28].
Perceptual studies with re-synthesised stimuli have previously reported that a F0 difference of 12% [26,32] corresponding to twice the frequency discrimination threshold (or just-noticeable difference, JND) is required in order to elicit consistent results in discrimination performance. The observed differences in F0s between feminine/neutral and masculine/feminine conditions are above this threshold (Tables 7 and 8), suggesting that these differences are perceptually relevant. Psychoacoustic studies using natural stimuli, such as the one produced here, could confirm whether this is the case and explore the perceptual relevance of the naturally occurring acoustic variation in the vocal expression of masculinity (or femininity).
F0 variation (F0CV) was higher for female speakers than for male speakers in reading the sentence and the passage; these longer stimuli may enable speakers to display more intonation variation [33]. This result suggests that women speak with a wider dynamic voice range than men, which is in line with genderstereotypes [34], but contrasts with acoustic research adopting similar log scale conversions [31,34,35]. In a comprehensive review of 40 years of research, Henton [31] found that previously reported male-female differences in pitch range disappeared or were reversed when re-examined using the semitonal scale (semitones = 39.866log (F0max/F0min)). The discrepancy between the present results and Henton's may arise from the different methodologies used to model pitch perception. Although  previous studies have cast doubts on the use of semitone scale as the most accurate measurement for F0 variation [36,37], the relative value of one method over the other is yet to be established. When asked to feminise their voices, men exhibited a nonsignificant trend in increasing their F0CV when reading the passage, but not in the other tasks. Women significantly increased their F0CV to feminise their voice when reading words, and decreased it to sound as masculine as possible when reading the passage. Although these differences are not consistent across all types of stimuli and between conditions, they nevertheless provide some indication that speakers may attribute wider intonation to female speech than male's, despite the fact that such attributions are largely unsupported by the literature [31]. Indeed, perceptual studies indicate that female speech is typically perceived as more 'melodious' than male's, both in pre-pubertal children's [38] and adults' voices [39]. Greater F0 variation also elicits higher femininity ratings, while more monotonous voices are judged to be more masculine [40].
Overall, speakers lowered their F1-F4 formants when asked to sound as masculine as possible and raised them to sound as feminine as possible. These global adjustments of formant frequency values are also reflected in the size and shifts of speakers' vowel spaces. Women's vowel space was larger and shifted top right relative to men's across conditions, in line with the known sex dimorphism [29]. However, both men and women's vowel spaces were larger, shifted upward to the right for the feminine condition, and were smaller and shifted downward to the left ( Figure 2) in the masculine condition, compared to the neutral condition. This indicates that speakers exaggerated speech patterns typical of the two sexes in order to masculinise and feminise their voices.
Formant spacing (DF) values in the neutral condition were also comparable to those reported in the literature for both adult men (1005 Hz [45]; 991 Hz, as calculated from F1-F4 values [26]) and women (1167 Hz [26]). Moreover, men's DF was on average 15% lower than women's, in line with the DF dimorphism reported in previous studies [26,46], and comparable to the 15%-20% baseline difference in anatomical vocal-tract length between the two sexes [12,13].
Consistent with our predictions, speakers widened their DF to feminize their voices and narrowed it to masculinise them, with wider shifts in formant values being observed when imitating opposite gender attributes than when exaggerating their own gender: averaged across reading tasks, men narrowed their DF by 2.7% to masculinise their voices, whilst women widened it by 1.9% to feminise theirs, whereas men widened their DF by 5.5% to feminise their voices and women narrowed it by 4.3% to masculinise theirs. These DF differences in the expression of gender-related attributes typical of the opposite sex correspond to the limit between the male upper and female lower DF ranges [25].
Perceptually, the DF differences observed here between the natural and experimental conditions as well as between feminised and masculinised conditions (see Tables 7 and 8) are less than one JND (about 6%) for DF [29]. Thus, in combination with the percentage differences on F0 reported above, our study indicates that, although speakers adjust both F0 and DF to express genderrelated attributes, only the F0 adjustments are likely to be perceived. Ultimately, by manipulating DF while preserving F0 and vice versa, future studies could look at the perceptual discriminability and relative salience of these two parameters in listeners' voice-based judgments of speakers' masculinity and femininity.
Is there a gender code?
Indications that adjustments in F0 and Fi parameters comparable to those observed in this study play a role in the expression of voice gender and related attributes are widespread in the literature on the sex dimorphism in the human voice. Despite having virtually the same vocal anatomy, pre-pubertal boys speak with lower formants than girls [16,17,47,48], suggesting that children acquire sex-specific behaviours, such as vocal tract gestures involving lip movements, to express their gender [47]. Acoustic studies of adult speakers also report within-sex differences in F0 and Fi that cannot be solely explained by anatomical differences. For example, in a cross-cultural study, Majewski [49] found that American men speak with a lower pitch (M = 118.9 Hz) than their Polish counterparts (M = 137.6 Hz), while Ohara [50] found that Japanese women raise their pitch when speaking in their native language and lower it when speaking in English, in line with femininity definitions in Japanese society. Additionally, research on the vocal expression of sexual orientation shows that, while homosexual speakers' voices do not differ in mean F0 from their heterosexual counterparts [51,52], they display a partial shift of formant values towards those typical of the opposite sex [53,54], even after controlling for body size [52]. Several perceptual studies also report that listeners rate adult voices characterised by higher pitch and formant values as more ''feminine'' [54,55], while speakers with lower pitch and formant values are rated as more ''masculine'' [29,44,56].
These observations suggest that speakers spontaneously use a ''gender code'', making a conventionalised use of the existing sex dimorphism in the frequency components of their voice to vary the expression of gender and related (e.g. masculinity/femininity) characteristics. We draw a parallel between this gender code and Ohala's [27] ''frequency code'' hypothesis, in which animal callers are expected to exploit the inverse correlation between resonator size and its resulting frequency in order to encode size and related (e.g. dominance/submission) attributes. Human male speakers have been shown to lower (or rise) F0 and Fi when they perceive themselves to be more (or less) dominant than their interlocutors [57,58]. Perception studies have also reported that listeners rate speakers with lower F0 and Fi as being bigger and more dominant than speakers with higher F0 and Fi [29,58,59]. However, the extent to which F0 and Fi manipulations encode for both dominance and gender characteristics is yet to be systematically explored. The imitation paradigm described in this study could be used to explicitly address this question by asking speakers to express dominance and masculinity both in conjunction and separately (e.g. to sound more dominant, more masculine, dominant and masculine, dominant and feminine). Psychoacoustic studies should also investigate the perceptual relevance of F0 and Fi adjustments in gender and dominance expression and whether the same gestures are perceived differently according to speaker's and listener's personality and emotional state, situational context, semantic content and society-specific stereotypes that characterise power and gender relationships.
The present study also explored visible vocal tract length adjustments underlying the observed acoustic manipulations in formant values by providing quantitative measurements of lip movements. We found that, in line with the observed between-sex differences in overall formant spacing, lip spreading and openness were greater in women than in men in the normal voice condition, suggesting that women speak with a smile. We also found that the majority of participants perceived themselves as spreading their lips more when they feminised their voices than when speaking normally or masculinising them. In line with these self-perceptions, lip measurements revealed that speakers tended to decrease lip spreading from the feminine to the masculine conditions, although significance was only reached when speakers tried to sound as masculine as possible. In contrast, no significant differences across conditions were found for lip openness and ratio. This suggests that lip gestures alone cannot fully account for the observed formant shifts. Indeed, while it was not possible to track vertical laryngeal displacement, more than one third of the participants, and particularly men, reported moving their larynx along the existing sex dimorphism in the experimental conditions and especially when masculinising their voices. It is possible that the enhanced protrusion of the human male larynx, compared to the female larynx, allows male speakers to be more aware of any movement in its position. It is worth noting that the males of several other mammalian species are known to actively lower their larynges during vocalisation in order to extend their vocal tracts and thus exaggerate the vocal expression of their body size (red deer [60], fallow deer [61]), pointing at selection pressures underlying the sexual dimorphism of the vocal tract (deer [62], humans [14]). A recent study also indicates that vocal tract length adjustments affect attributions of physical and social dominance in human males [58].
Further investigations should consider more sophisticated techniques to better quantify lip movements (e.g. motion tracking [63,64]), as well as measure laryngeal vertical shifts (e.g. using ultrasound or MRI) in order to establish the respective role of such adjustments in the manipulation of vocal tract length to vary the expression of gender or related attributes.
Finally, the observed lip gestures performed to feminise or masculinise the apparent gender of the voice are likely to impact facial expressions and associated gender stereotypes. While Ohala [27] suggested that the retraction of lip corners to sound smaller and their rounding and protrusion to sound bigger are, respectively, at the origin of the smile and the ''o-face'' which are common in dominance displays, we propose that individuals feminising their voice are likely to spread their lips, and therefore project a ''cheerful'', unthreatening face, and those masculinising their voice are likely to round their lips, and therefore project a more ''angry'', dominant face. Indeed, women tend to smile more than men [65], possibly following cultural norms [66][67][68][69].

Future directions
The present study shows that untrained speakers have the spontaneous ability to modify the expression of their gender and related traits through the voice, but does not shed light on their acquisition and use in every day life. We suggest that future studies could (i) extend the imitation paradigm adopted in this study to children and investigate the acquisition and development of sextypical ways of speaking according to age, (ii) investigate whether children and adults vary the expression of their gender in different settings, and when complying with varying gendered and sex roles within and across different societies, as well as the perceptual relevance of these variations. Figure S1. Illustration of the fitness of the method used to estimate overall formant spacing. Frequency values of F1,F2 and F3 for male (A) and female (B) adult (.19 years old) speakers as measured in Lee et al. [17] plotted against (2i21)/2 increments of the formant spacing as predicted by a uniform vocal tract model. Formant spacing DF can be estimated as the slope of the linear regression of observed Fi over the expected formant positions (with intercept set to 0). The apparent Vocal Tract Length (aVTL expressed in centimetres) can be calculated as aVTL = c/2DF. The values of DF reported in the figures correspond to aVTL values of 17.71 cm for male speakers and 14.95 cm for female speakers, which are comparable to anatomical vocal tract lengths in adult men and women (men: 18 cm, women: 15 cm [10]). This illustrates that, while DF estimated in this way is sensitive to vowel-specific variation in vocal tract configuration, at supra-segmental level it provides an estimate of the overall linear scaling of the formants which is a reliable estimate of the average vocal tract length of the speaker. (TIF)