Figures
Abstract
Purpose
To determine if there is an association between vocal gender presentation and the gender and context of the listener.
Method
Quantitative and transversal study. 47 speakers of Brazilian Portuguese of different genders were recorded. Recordings included sustained vowel emission, connected speech, and the expressive recital of a poem. Subsequently, four scripts were used in Praat to extract 16 acoustic measurements related to prosody. Voices underwent Auditory-Perceptual Assessment (APA) of the gender presentation by 236 people [65 speech and language pathologist (SLP) with experience in the area of the voice (SLP), 101 cisgender people (CG), and 70 transgender and non-binary people (TNB)]. Gender presentation was evaluated by visual analogue scale. Agreement analyses were executed among quantitative variables and multiple linear regression models were generated to predict APA, taking the judge context/gender and speaker gender into consideration.
Results
Acoustic analysis revealed that cis and transgender women had higher median fundamental frequency (fo) values than other genders. Cisgender women exhibited greater breathiness, while cisgender men showed more vocal quality deviations. In terms of APA, significant differences were observed among judge groups: SLP judged vowel samples differently from other groups, and TNB judged speech samples differently (both p<0.001). The predictive measures for the APA varied based on the sample type, speaker gender, and judge group. For vowel samples, only SLP judges had predictive measures (fo and ABI Jitter) for cisgender speakers. In number counting samples, predictive measures for cisgender speakers included fomed and HNR for CG judges, and fomed for both SLP and TNB judges. For transgender and non-binary speakers, predictive measures were fomed for CG and SLP judges, and fomed, CPPs, and ABI for TNB judges. In the poem recital task, predictive measures for cisgender speakers were fomed and HNR for both SLP and CG judges, with additional measures of cvint and sr for CG judges, and fomed, HNR, cvint, and fopeakwidth for TNB judges. For transgender and non-binary speakers, the predictive measures included a wider range of acoustic features such as fomed, fosd, sr, fomin, emph, HNR, Shimmer, and fo peakwidth for SLP judges, and fomed, fosd, sr, fomax, emph, HNR, and Shimmer for CG judges, while TNB judges considered fomed, sr, emph, fosd, Shimmer, HNR, Jitter, and fomax.
Conclusions
There is an association between the perception of gender presentation in the voice and the gender or context of the listener and the speaker. Transgender and non-binary judges diverged to a higher degree from cisgender and SLP judges. Compared to the evaluation of cisgender speakers, all judge groups used a greater number of acoustic measurements when analyzing the speech of transgender and non-binary individuals in the poem recital samples.
Citation: da Cruz Martinho DH, Lopes LW, Dornelas R, Constantini AC (2024) Can acoustic measurements predict gender perception in the voice? PLoS ONE 19(11): e0310794. https://doi.org/10.1371/journal.pone.0310794
Editor: Li-Hsin Ning, National Taiwan Normal University, TAIWAN
Received: May 24, 2024; Accepted: September 2, 2024; Published: November 14, 2024
Copyright: © 2024 da Cruz Martinho et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript.
Funding: DHCM This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 The funders did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. Website: https://www.gov.br/capes/pt-br.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The social construct of gender is a concept that describes how societies attribute different meanings and expectations to people based on their gender identities. Since gender is socially constructed, there are no fixed or universal characteristics. Moreover, gender presentation varies by culture, taking societal characteristics and the current time period into consideration [1, 2].
It is essential to distinguish between three main concepts: gender presentation, gender assignment, and gender identity. Gender presentation refers to the external expression of gender through voice. Gender assignment, on the other hand, refers to the classification of an individual as male or female at birth based on biological characteristics. Gender identity is the internal sense an individual has of their own gender, which may or may not correspond to the gender assignment made at birth [3, 4].
As such, a person is seen by their daily actions, reiterating social norms; and gender presentation involves acting in a way that expresses their gender to the world. A speaker’s performance can affect the interlocutor, influencing social recognition [5].
Starting at birth, people are socialized in accordance with the gender norms specific to their culture. These norms influence behaviors, including the way one communicates and interacts with others in different contexts. Generally speaking, it is through socially shared norms within a culture that people can have expectations about gender presentation. Specifically, people may be expected to fit into one of the binary gender categories (i.e. man or woman) and follow the behavioral patterns associated with these categories [2].
The voice is one of the characteristics by which a speaker’s gender can be perceived [6]. In general, higher-pitched voices are associated with the feminine gender, while lower-pitched voices as associated with the masculine gender [7, 8]. However, the presentation and perception of gender through the voice can be influenced by the context of communication and by the values and preferences of the listener.
Frequency of oscillation (fo), measured in Hertz (Hz), is the acoustic correlate of pitch and corresponds to the number of vibrational cycles emitted by the vocal cords during a given interval of time [9]. Traditionally and from the cisgender perspective, fo has been used as one of the main measurements related to the perception of binary genders. Obviously, fo values attributed to the presentation of man and woman voices (in binary terms) can vary among different cultures and languages. For speakers of Brazilian Portuguese, the expectation in cisgendered terms is that men present an fo between 80Hz and 150Hz and women between 150Hz and 250Hz [10].
While many cultures have a predominance of fo values similar to those found in Brazil, such as American English with expected averages of 118Hz for men and 208Hz for women [11], the average fo values for some languages such as Wu Chinese are nearly equivalent for man and woman speakers [12]. This also occurs in Danish, in which the difference in fo between genders is relatively small [13], in contrast with Russian, in which differences are relatively larger [14]. In different cultures and social contexts, expectations in relation to the voice can differ [2, 7]. For example, in Yoruba culture, it is more appropriate for women to have firmer and more commanding voices [2], while in Brazilian or occidental culture, a softer and more melodic voice can be valued [7]. Cultural norms mold and direct the production and perception of the voice in function of gender. As such, sociocultural variables and the preferences and values of listeners influence the perception of gender meaning that fo values constitute only one of the variables related to this perception [15–18].
It is common for transgender and genderfluid people to search for specialized treatments to modify their voices due to feeling that their voices do not represent them [8, 19]. In these cases, the job of a speech and language pathologist involves respecting the self-designation of the person and autonomy in constructing therapeutic objectives. Interventions can include adjustments in resonance [15, 17, 18, 20–24], softening emission with an increase in breathiness [17, 25], improving articulatory precision [15, 18], and even phono-articulatory adjustments such as modifying the position of the tongue [26].
Besides vocal characteristics, some studies discuss behavioral factors related to gender with an emphasis on pragmatic elements of discourse [18, 21, 22] such as prosody [17, 18, 21–24] and other aspects of non-verbal communication such as body language and the use of gestures [21–23]. The inclusion of these characteristics in phonoaudiological therapy reinforce that the perception of gender in the voice is multifactorial and, therefore, these same characteristics may be related to different acoustic measurements correlated, or not, with the aforementioned aspects.
Current scientific productions regarding gender presentation has focused predominantly on the analysis and suitability of the voice with a foundation in binary stereotypes of femininity and masculinity [15, 17, 18, 20–26]. However, a considerable gap resides in the scarcity of studies that consider and understand the different perceptions of gender to beyond the cisgender viewpoint. This gap limits the comprehension of individual experiences and necessities which transcend binary categories here.
A previous study using the same samples as this research highlighted that gender perception in the voice can vary depending on the judge group, indicating a significant influence of the evaluator’s context on judgments [27]. This finding evidence the need for the present study and raises the hypothesis that the evaluator’s background may also lead to differences in the evaluation of voices, whether from cisgender or transgender individuals. In other words, it explores how each judge group responds when confronted with gender diversity in voice.
In sociophonetics, the study of gender perception in the speaker’s voice as a continuous variable is well-established. Previous research has shown that distinct patterns of phonetic variation allow listeners to make inferences about sexual orientation or perceptions of masculinity and femininity, highlighting that these perceptual parameters are interrelated but distinct [28]. Similarly, variations in pitch and specific phonetic characteristics can be interpreted differently across languages; for instance, higher pitches are often associated with more effeminate voices. However, the perception of these cues can vary significantly depending on the listener’s linguistic and cultural context [29]. Thus, analyzing gender presentation as a continuum is an extension of a robust body of research exploring the subtleties of how vocal characteristics convey gender and sexual orientation.
Furthermore, in Speech-Language Pathology, some studies have utilized analogical-visual scales to assess how gender is perceived in the voice, emphasizing the importance of considering multiple perspectives beyond traditional binary models [27, 30–33]. Factors such as sexual orientation, gender identity, and listeners’ familiarity with the LGBTQIAPN+ community can influence their perceptions of gender in the voice [34, 35]. In the United States, for example, the influence of linguistic context and listeners’ gender identity on identifying the gender of transgender speakers seems relatively limited, suggesting that gender perception is more fluid and continuous rather than strictly binary [35].
This study stems from the premise that other acoustic measurements, besides fo, contribute to the perception of the presentation of gender in the voice and that this perception is variable as a function of the context of the listener. This study innovates by utilizing gender as a continuous, non-categorical variable in Brazilian speakers and by considering the perspective of people with different gender identities. The presented approach can contribute to a deeper understanding of the complex interaction between the voice, the perception of gender, and the context of the listener, recognizing that the variability of life experiences and interactions with femininities and masculinities has the potential to offer a broader and more sensitive viewpoint to orient more inclusive and suitable practices. As such, the objective of this study was to determine if there is an association between the perception of gender presentation in the voice and the gender and context of the listener, as well as to analyze the acoustic measurements that can predict the perception of gender presentation through the voice.
Method
Quantitative, observational, transversal, prospective study submitted to and approved by the Committee of Ethics in Research (Comitê de Ética em Pesquisa), file number 4.730.175. Data collection was carried out following the same methodology as Martinho & Constantini (2024) [27] and was performed between August 2021 and January 2022 in a school-clinic at a university in the city of Campinas, São Paulo, Brazil. All participants signed the informed consent form in writing.
Selection and recording of participants
Participants were recruited by social media and by e-mail through invitations sent by Lesbian, Gay, Bisexual, Transgender, Queer/Questioning, Intersex, Asexual/Aromantic/Agender, Pansexual/Polysexual, Non-Binary, and more (LGBTQIAPN+) reference centers. The snowball sampling methodology was adopted. This characterizes a non-probabilistic convenience sample. The study was conducted in 2 stages and the first stage involved the recording the voices of 47 speakers of Brazilian Portuguese: 11 cisgender women, 11 transgender/travesti woman, 11 cisgender men, 7 transgender men, and 7 non-binary people. The participants that had their voices recorded were between the ages of 18 and 47 (average 25.94; median 24). In gender divisions, the group of transgender women included transexual women and travestis. As a methodological choice, these people were chosen to be grouped, since both identities perform gender roles that express aspects of femininity and, in Brazil, the identity of a travesti can be adopted through a political manifestation that goes back to the beginning of the discussion of transgenderism in the country.
Exclusion criteria were: the presence of health problems that could affect vocal quality on the day of data collection (such as flu, cold, or airway infections, as self-reported by the participant); continuous use of medication that could interfere with vocal production (except for hormonal therapy); and the use of tobacco. In addition, the participant could not have reported any vocal health issues on the day of collection.
After receiving instructions about the procedures involved in the research, the participant was directed to an acoustically controlled recording area with background noise levels below 50dB SPL. Recordings were performed with the participant standing inside an acoustic cabinet, using a unidirectional microphone placed 10 centimeters from the mouth. For this procedure, a Dell desktop computer and Shure® microphone (model SM58) coupled with a Tascam® sound card (model US100) were used. Speakers were recorded directly through Praat software (version 6.2.14) [36] with a sampling rate of 44kHz. The recording environment is depicted in Fig 1.
The recording procedure consisted of the emission of three sustained [a] vowels, chosen for its more neutral vocal tract configuration and for permitting the evaluation of gender without communicative characteristic influences; the emission of an ascending-descending glissando using the [a] vowel, to identify vocal extension; connected speech (counting numbers from one to ten); and the recital of the poem “O amor bate na Aorta” written by the Brazilian author Carlos Drummond de Andrade [37]. For this last task, the participant was instructed to recite the poem expressively, with the command “recite the following poem with emotion”. These tasks are necessary to evaluate voice dynamics by way of acoustic analysis and Auditory-Perceptual Assessment.
The selection of the aforementioned tasks was strategic to encompass different aspects of vocal production, considering whether or not there is interference from communicative aspects.
Extraction of acoustic measurements
The following were selected for extraction and analysis: acoustic measurements related to fo, measurements of noise perturbations related to breathiness and strain, as well as measurements related to articulation and prosody. Acoustic measurements selected for this study can be found in Table 1.
Acoustic measurements were extracted from the sustained [a] vowel samples, connected speech, and poem recital using Praat [36] software and four scripts, detailed below.
The choice of a wide range of acoustic measures is justified by the importance of prosodic, acoustic, and communicative aspects in gender perception. The fundamental frequency (F0) plays a crucial role in this context [8], making the inclusion of various descriptors of this measure essential for analysis. Additionally, noise-related characteristics were considered to assess vocal quality and stability. Prosodic aspects are also crucial as they reflect the dynamics and expressiveness of speech, which are significant elements in expressing gender identity. Measures of vocal quality were integrated to evaluate timbre and overall clarity of the voice, complementing the analysis.
The literature reveals that acoustic measures vary significantly according to the speaker’s gender. For example, phonetic differences observed between cisgender individuals include higher formant frequencies and slower speech rates in women [47]. Measures such as Cepstral Peak Prominence (CPP) and the difference between the first two harmonics (H1-H2) are useful for analyzing vocal quality, showing that voices perceived as feminine tend to be breathier [48]. Additionally, research indicates that cisgender women produce vowels with longer duration and higher formant frequencies when compared to cisgender men [49].
ABI and AVQI.
The AVQI and ABI are multiparametric indices where AVQI evaluates overall vocal quality and ABI quantifies breathiness present in the voice. Both of the indices consider connected speech and the sustained [a] vowel to give a single final score, from zero to ten, in which zero corresponds to the absence of deviations and ten corresponds to extreme vocal deviation [50].
The AVQI (version 03.01) takes six acoustic measurements into consideration for its score: smoothed cepstral peak prominence (CPPs), harmonics-to-noise ratio (HNR), local shimmer (SL), local shimmer dB (SLdB), general slope of the long-term average spectrum (Slope), and tilt of the regression line through the long-term average spectrum (Tilt), according to the mathematical formula below [45].
The ABI considers nine acoustic measurements to calculate its score: smoothed cepstral peak prominence (CPPs), jitter, glottal-to-noise excitation ratio at a maximum frequency of 4500 Hz (GNEmax-4500Hz), high-frequency noise value between 0-6kHz and 6-10kHz (Hfno-6000Hz), Dejonckere and Lebacq harmonic-to-noise ratio (HNR-D), difference between the first and second harmonic (H1-H2), shimmer dB, shimmer, and period standard deviation (PSD), according to the mathematical formula below [46].
The choice to employ AVQI and ABI in the analysis of the cohort composed of normal voices was motivated by the need for a comprehensive and objective assessment of vocal quality, these are current and robust measures that have been studied in various languages and populations [45, 46, 50]. This consideration arises from the understanding that gender perception in the voice extends beyond traditional measures of oscillatory frequency. This approach allows for an exploration of whether vocal quality and breathiness measures can impact how gender is perceived.
Two files were imported into Praat [36] to extract the ABI and AVQI indices, the connected speech file (counting numbers) and a 3-second sustained [a] vowel file. The files were handled following orientations provided by the authors of the script [45, 46]. The results of the individual acoustic measurements that compose the indices and index scores were extracted and tabulated.
Prosody descriptor and mark-pauses.
We used the Prosody Descriptor script [51] for the extraction of prosodic-acoustic measurements, performed separately for each type of vocal sample (connected speech and poem recital).
Descriptor requires that the audio files be labeled in Praat [36], identifying the pauses and number of phonetic syllables in each segment of speech. To standardize pause segmentation, the Mark-pauses script [52] was used. The script identifies silent pauses and performs automatic segmentation. After checking, the speech segments and phonetic syllables were marked on the corresponding lines. An example of segmentation can be seen in Fig 2, below. After executing, the Prosody Descriptor script generated a report in text file format with all the extracted acoustic measurements, which were organized into tables.
Layer 1 –speech segments; Layer 2- silent pauses; Layer 3- segment of speech with the number of phonetic syllables in that segment.
Auditory-Perceptual Assessment
In the second stage of the study, APA of gender presentation in voices was carried out using the previously recorded files from stage 1. Gender presentation in the voices was judged by 236 people: 65 speech and language pathologists with experience in the area of voice (SLP), 101 cisgender people (CG), and 70 transgender and non-binary people (TNB). The total group of judges were between the ages of 18 and 68 years (average 31.84 years; median 27 years).
The selection of individuals who conducted the judgments was guided by the need to understand and to explore how transgender individuals perceive gender in the voice, as well as whether speech-language pathologists working in the field of voice have a perception similar to cisgender individuals. This relates to our hypothesis that people perceive gender differently depending on their context and background.
Inclusion criteria for this stage were: be a native speaker of Brazilian Portuguese and be 18 years of age or older. A SLP needed to have been practicing for at least six months in the area of the voice. All SLP participants were cisgender and the difference between this group and CG was their specialty in the voice. Exclusion criteria were: participation in the recording stage or the presentation of any auditory issues.
For this stage, two questionnaires were created using the SurveyMonkey® platform: one questionnaire with segments of spoken voice and the other questionnaire containing the sustained [a] vowel. Judges did not have information about the speaker that provided this voice sample, and the presentation of vocal samples in both questionnaires was random, so that each judge evaluated the voices in a different order.
For the sample composition available in the questionnaire, voices were segmented in Praat so that the following samples could be evaluated: I–complete sample of the central [a] vowel (second of three requested repetitions); II–sample with connected speech (counting from one to ten) and the first verse of the poem. Each segment was presented in a different questionnaire with each vowel sample lasting 5 seconds and the speech samples lasting 15 seconds each. In addition, nine (20%) of the samples in each task were randomly repeated to calculate evaluation consistency of the judge. For the confidence calculation, an F-test was used considering a p-value cutoff of 0.05.
Judges evaluated gender presentation in the voice using a visual analogue scale with 101 points allowing the evaluation of gender on a continuous scale with the possibilities of this scale ranging from very masculine voice (attributed score of -50) to the far left, to very feminine (attributed score of 50) to the far right, as shown in Fig 3. The center point of the scale (attributed score of zero) represented neutral voices, those which the judges could not classify the gender.
Analysis of the results.
The data were grouped into tables and analyzed in a descriptive and inferential manner utilizing SPSS 25.0 software. A significance level of 5% was considered for inferential analyses. The inferential agreement analysis between quantitative variables was performed using the Intraclass Correlation Coefficient. Multiple linear regression models were used to predict dependent variables. Independent variable selection was performed using the stepwise method. The Fig 4 summarizes all the research procedures.
Results
Acoustic analysis
This section presents the descriptive analysis of extracted acoustic measurements from the different speech tasks. The ABI indicated that eight (17.02%) participants obtained scores above the cutoff, suggesting that they presented a degree of breathiness above what was expected. The same occurred for the AVQI, in which 17 (36.17%) participants were above the cut off, indicating possible deviations in vocal quality. Table 2 shows the medians of the extracted acoustic measurements in accordance with the analyzed speech task and the gender of the speaker. Note that cis and trans women have a median fo higher in relation to other genders. Cis women showed the highest levels of breathiness in the voice according to the ABI score followed by non-binary people and trans women. Cis men expressed the highest indices of vocal quality alteration according to the AVQI score, followed by trans men and non-binary people.
Auditory-Perceptual Assessment
Testing the inter-evaluator consistency resulted in statistical significance in all groups (p<0.001); however, intra-evaluator consistency measured by F-test resulted in 79 (32.3%) of the judges having significance above the p-value cutoff for the vowel judgement and nine (3.7%) for speech. These data indicate that the judges had more difficulties judging gender in the voice using only the sustained vowel sample. As such, only judges that presented low consistency (p>0.05 in the F-test) with the speech task were excluded since we considered that the speech sample analysis offered a larger number of acoustic cues to provide a reliable analysis. All of the excluded judges (n = 9) were cisgender women, two from the CG group and seven from the SLP group.
The means, medians, and standard deviations of the evaluation of gender presentation according to judges can be found in Table 3, grouped by speaker gender. Only cisgender women had their gender presentation perceived as feminine (positive mean and median values) and cisgender men had their vocal gender presentation perceived as more masculine with values close to -50. Non-binary people, trans women and/or travestis and trans men were closer to the neutral range, with values near zero.
Fig 5 demonstrates a visual description of the average APA of gender presentation in the voice for each vocal sample in each judge group for the sustained vowel and speech tasks. The highest level of disagreement can be seen in the SLP group for the sustained [a] vowel task and in the TNB group for the speech tasks. Disagreements in judgement are indicated by divergences from overlapping line.
CG: Group of cisgender people; SLP: Group of speech and language pathologists with experience in the voice; TNB: Group with trans and non-binary people.
Predictive analysis of Auditory-Perceptual Assessment.
Multiple linear regression models were generated with the following objectives:
- Calculate acoustic measurements that are predict Auditory-Perceptual Assessment of the gender of the speaker for each group of listeners;
- Determine how each group of listeners evaluates speaker gender.
Vowel samples.
Two multiple linear regression models were generated for each group of judges to determine if the acoustic measurements of vowel analysis can predict APA for the vowel samples in CG and TNB groups.
For the CG and TBN judge groups, none of the extracted acoustic measurements from the [a] vowel demonstrated the capacity to predict Auditory-Perceptual Assessment by these groups for these vowel samples. For judges in the SLP group, a statistically significant model was generated for APA of the vowel of cisgender speakers (p<0.001), seen in Table 4.
Speech samples.
Four multiple linear regression models were generated for each group of judges to determine if the independent variables, the extracted acoustic measurements from connected speech (number reading or poem recital), are capable of predicting APA from the speech sample in two different speaker groups (cisgender speakers or trans and non-binary speakers), the dependent variable.
The acoustic measurements extracted from number counting samples that were predictive of APA (p<0.001) in the three judge groups can be found for the evaluation of cisgender speakers in Table 5 and trans and non-binary speakers in Table 6. Acoustic measurements extracted from the poem recital samples can be found for cisgender speakers in Table 7 and trans and non-binary speakers in Table 8.
Table 9 presents a summary of the predictive measures in each speech task according to the judge and speaker groups.
Discussion
Gender is socially constructed and undergoes constant change [1]. As a result, expressions and perceptions of masculinities and femininities can vary in accordance with the context of the locutor and interlocutor. As such, this study probed the associations between the perception of gender presentation in the voice and the gender and context of the listener, presenting acoustic measurements that can predict gender presentation in the voice. The study innovated by including the perception of individuals of different genders, including the transgender population, which typically does not participate as judges in the APA.
By way of acoustic analysis of the voices of the speakers, cis and trans women presented median fo values higher than other groups. Voice breathiness also was higher in cis women, non-binary people, and trans and/or travesti women, presented in order of decreasing breathiness (Table 2), potentially having influenced the perception of gender presentation by the judges that used vocal quality acoustic measurements as predictors of gender presentation.
The extracted acoustic measurements of fo and breathiness (Table 2) evidence a gradual increase in breathiness as the gender presentation of the speaker changes. This finding reveals a fluidity in expression and that, for the two aforementioned acoustic measurements, each gender is a point on a continuous scale, ranging from the highest fo and breathiness levels in cis women, moving across trans women, non-binary people, and trans men, to the lowest levels in cis men. Breathiness is indeed an important point to be considered in gender presentation and is an aspect specifically worked on during voice feminization therapy [21].
In the present study, cis men presented higher levels of vocal quality disturbances (as shown by AVQI scores), followed by trans men and non-binary people. Moreover, cis and trans men also had lower fo values (Table 2). The AVQI, a multifactorial measurement to generate a single score, is considered to be one of the most robust objective acoustic measurements of voice quality and severity of dysphonia, providing cutoffs for disturbance levels that do not vary with the gender of the speaker [53, 54].
The linear regression analyses indicated the relationship between judgement of gender presentation in the voice by listeners and the acoustic measurements extracted from the voices of the speakers. This statistical method investigated the linear relationship between APA and acoustic measurements. The predictive analysis of APA provided valuable information that can redirect the currently predominant binary viewpoint of gender presentation in the voice, including in a speech and language pathology clinic setting.
During the evaluation process, it’s common for an analysis of vocal quality to be performed with a higher number of vocal tasks, considering the sustained vowel and connected speech, because vocal behavior is expected to vary by task [55]. In addition, it is worth noting that as the complexity of the task increases, so does the amount of acoustic and communicative information that is made available to influence the judgement of the listener [55]. This relationship was also made evident by the number of acoustic measurements used in APA found in this study, which was lowest in the sustained vowel samples, increased in the connected speech samples, and was highest in the poem recital samples.
In the sustained vowel samples, only the SLP group was found to have acoustic measurements that coincided with their APA, all of which were associated with fo (Table 4). In the same samples, perception of gender presentation by this group diverged from the perceptions of the other two (Fig 5). Speech and language pathologists commonly evaluate vocal quality through sustained vowel samples [55], which would explain this finding. Nonetheless, this group still seems to have considered only the perception of frequency in their evaluations of gender presentation, even though the literature [18, 56, 57] indicates the need to consider additional factors other than those related to the frequency of the voice as important for gender presentation in the voice.
In the evaluation of connected speech (counting numbers), median fo was the most important parameter for the perception of gender in all judge groups. However, judges from the CG group also used HNR to evaluate other cisgender speakers (Table 5) and judges from the TNB group included acoustic measurements of breathiness, such as ABI and CPPs, to evaluate other trans and non-binary speakers (Table 6).
These results may indicate that a listener considers different aspects to identify genders that are different from their own. Additionally, this indicates that judges from the TNB group use acoustic measurements of vocal quality in their evaluations. Interlocutors that share similar contexts and experiences tend to have more effective communication [58], which could explain the difference in evaluation by judges that have a different gender from the speaker.
Cisgender people (SLP and CG groups) tend to consider fo more in judgements of gender presentation in the voice, while trans and non-binary people also react to aspects of vocal quality. The perception of gender presentation by judges in the TNB group were seen to differ from the CG group and, importantly, from the SLP group. Professionals in the area have, in theory, a broader perception of vocal quality and other vocal aspects than even the trans population, something that has become increasingly more common in SLP clinics. The impacts of these findings could lie in the unattainable expectations of femininity or masculinity in the voice, seeing as the visions within the clinic do not converge with those of the clients.
Additionally, it is important to consider how professional biases can impact voice assessment. For instance, linguistic biases can shape SLPs’ attitudes towards clinical scenarios, suggesting that similar biases may occur in the evaluation of voice characteristics related to gender [59]. Similarly, differences in speech accuracy assessment between SLPs and untrained listeners suggest that professionals’ training and experience may either mitigate or amplify biases [60].
These issues may explain why judges from SLP and TNB groups use different acoustic measures to evaluate the voices of transgender and non-binary individuals. Gender perception and the way acoustic measures are utilized may be influenced by personal experiences and implicit biases, underscoring the importance of ongoing training and clinical practice that recognizes and minimizes such biases. A speech and language pathologist must respect the autonomy of a trans person as their client during the construction of the therapeutic process. To do so, they must develop a keen ear with the objective of respecting the goals of vocal performance belonging to that person.
The results from the predictive analysis involving acoustic measurements extracted from the poem recital highlight that there is a difference in how a listener evaluates speakers that play the same gender roles as them compared to other genders. Judges from the CG group evaluated cisgender speakers using fewer acoustic measurements compared to trans and non-binary speakers (Tables 7 and 8). The same occurred for the SLP group which, when evaluating cisgender speakers, utilized only two acoustic measurements; though when they evaluated trans and non-binary speakers, a total of eight acoustic measurements were utilized. The TNB group also seemed to consider more acoustic measurements when evaluating other trans and non-binary speaker; however, differently from the other two judge groups, more acoustic measurements not related to fo were included as predictive of APA. These data indicate that the acoustic cues that each listener utilized to perform APA were different, being possibly related to the different life experiences of these individuals, as well as the type of vocal material they are exposed to in their day-to-day lives and in the media. These data also reinforce the importance of previous experiences when performing Auditory-Perceptual Assessment [55, 61].
In practically all significant regression models, fo was present; even though it is associated with other acoustic measurements, it is still relevant to gender presentation in the voice. However, gender presentation changes as society changes [1] and how femininity and masculinity are perceived can also change over the years.
All judge groups used a larger number of acoustic measurements to judge the speech of trans and non-binary people. In studies with artificial intelligence [62, 63], there is not complete accuracy when judging binary gender presentation in the voice of cisgender speakers; it becomes evident that the less contact a listener has with the gender presentation of the speaker being evaluated, the more cues a listener needs to perform APA. Trans and non-binary people can have a more fluid gender expression, making it more difficult to evaluate them categorically, requiring judges to use more auditory cues to evaluate the gender presentation of these voices.
Perceptions of the human voice have importance in social communication and allow the recognition of various pieces of information about the identity of the speaker [64]. This recognition begins in early childhood [65] and develops to the point at which a listener can recognize speakers by their voice, as well as form ideas about that person’s gender, age, ethnicity, and social status [66]. Nonetheless, this type of recognition depends on the auditory memory and type of vocal sample that the listener is exposed to throughout their life [67]. As such, when judging voices of people with a more fluid gender presentation, listeners can have more difficulty due to their limited or even lack of previous experience with similar vocal models.
A recent study [64] using neuroimaging provided evidence that different cortical regions are involved in the processing of different types of vocal information. As such, the recognition of gender presentation of a speaker though their voice is a higher-level cognitive ability in which linguistic, affective, and identity information are processed in partially segregated cortical pathways. This reinforces that gender presentation is therefore a continuous and not a categorical variable; there is not only one way to be feminine or masculine, but rather many possibilities between two extremes. For the general population, voices that do not fit into predetermined social patterns require more attention for their evaluation.
As such, it would be natural to suggest that trans and non-binary people, as a result of having other life experiences than masculinities and femininities, possibly regard the roles and presentation of gender in a different way than cisgender people. This is confirmed by the divergence in the perception of gender presentation by APA from the speech samples from the TNB group compared to the other two groups (Table 3 and Fig 5).
Specifically regarding the differences found in the perception of SLP judges, these differences in evaluation may reflect in clinical practice. It is known that heterosexual health professionals or those with limited exposure to sexual and gender diversity tend to have more negative biases when working with the LGBTQIAPN+ population, suggesting that such biases may affect how professionals assess vocal characteristics of sexual and gender minorities [68]. It is observed that microaggressions and biases can impact clinical care, which aligns with the idea that SLPs’ training and experience can influence their gender evaluations [69]. Therefore, the difference in the use of acoustic measures between groups may reflect not only a technical approach but also the need for greater sensitivity to professional biases and a better understanding of individual clients’ experiences.
Our findings indicate that personal experiences of professionals in speech and language pathology can hold a more leading role in evaluating gender presentation in the voice. Such experiences can add to their technical knowledge, reinforcing the need to break the paradigm of voices fitting into binary categories.
Although the voice samples of 47 participants were collected in this study, there is still no data in the literature that can allow for sample size calculations. This could result in limiting the translation of these results to the general population. Moreover, the study was carried out with native speakers of Brazilian Portuguese, further limiting generalization capacity since the perception of gender through the voice can be influenced by cultural norms and specific languages.
The absence of conversational speech samples was compensated for by the inclusion of specific tasks that capture relevant nuances in vocal production, thus providing a comprehensive and meaningful approach to achieve the research objectives.
The research aimed to investigate gender perception in the voice, taking into account different groups of judges with a different background and a diverse cohort of speakers. Thus, the adopted methodological choice sought to explore the nuances and complexities present in the auditory-perceptual evaluation of vocal gender in a specific sample, without the need for a traditional control group. The focus was on understanding the dynamics involved in interpreting gender presentation in the voice, considering variables such as the gender identity of the speakers and the experience of the judges, without imposing restrictive conditions that a control group might introduce.
Nonetheless, the experimental design and the results of the present study can offer important insight for the execution of similar investigations with native speakers of other languages and cultures.
Conclusion
There is an association between the perception of gender presentation in the voice and the gender or context of the listener and a speaker. The study showed that gender and the context of the evaluator influence the perception of gender presentation in the voice. The acoustic measurements and the group of judges also affect perception. Transgender and nonbinary judges diverged to a larger degree from cisgender judges and speech and language pathologists. All judge groups used more acoustic measurements to analyze the speech of trans and nonbinary people. The results indicate the necessity to evaluate gender presentation as a continuous and non-categorical variable. More research is necessary to understand the perception of gender presentation in the voice and its relation to communication, considering the perspective of trans people. The prominent representation of trans people can contribute to a redefinition of gender but requires collective and public political initiatives.
References
- 1.
BUTLER J. Introduction. Bodies That Matter Discursive Limits Sex, London: Routledge; 2011, p. 19–59.
- 2.
Oyěwùmí O. The social order and biology: natural or constructed? In: Oyěwùmí O, editor. Invent. Women Mak. an African Sense West. Gend. Discourse, Rio de Janeiro: Bazar do Tempo; 2021.
- 3.
Zimman L. Transmasculinity and the voice: Gender assignment, identity, and presentation. In: Milani T, editor. Lang. masculinities Performances, Intersect. dislocations, New York: Routledge; 2015, p. 197–219.
- 4.
Zimman L. Gender as stylistic bricolage: Transmasculine voices and the relationship between fundamental frequency and/s/. vol. 46. 2017. https://doi.org/10.1017/S0047404517000070.
- 5.
BUTLER J. Gender Trouble: Feminism and the Subversion of Identity. New York: Routledge; 2006.
- 6. Santos LA dos, ANTUNES LB. The social construction of the voice in gender performativity: a prosodic analysis in female transgender speech. Caletroscópio 2020;8:63–82.
- 7. Braga A, Piovezani C. Discursos sobre a fala feminina no Brasil contemporâneo. Rev Da ABRALIN 2021:1–19. https://doi.org/10.25189/rabralin.v19i1.1694.
- 8. Zimman L. Transgender voices: Insights on identity, embodiment, and the gender of the voice. Lang Linguist Compass 2018;12:1–16. https://doi.org/10.1111/lnc3.12284.
- 9.
SUNDBERG J. Sistema Fonador. Ciência da Voz Fatos sobre a Voz na Fala e no Canto, Editora Da Universidade de São Paulo: 2015, p. 25–44.
- 10.
BEHLAU M. Avaliação de voz. In: Behlau M, editor. Voz o livro do Espec., Rio de Janeiro: Revinter; 2001, p. 85–180.
- 11. Goy H, Fernandes DN, Pichora-Fuller MK, Van Lieshout P. Normative voice data for younger and older adults. J Voice 2013;27:545–55. pmid:23769007
- 12. Rose P. How effective are long term mean and standard deviation as normalisation parameters for tonal fundamental frequency? Speech Commun 1991;10:229–47. https://doi.org/10.1016/0167-6393.
- 13.
Johnson K, Sjerps MJ. Speaker Normalization in Speech Perception. Handb. Speech Percept., Wiley; 2021, p. 145–76. https://doi.org/10.1002/9781119184096.ch6.
- 14. Lobanov BM. Classification of Russian Vowels Spoken by Different Speakers. J Acoust Soc Am 1971;49:606–8. https://doi.org/10.1121/1.1912396.
- 15. Dacakis G, Oates J, Douglas J. Beyond voice: perceptions of gender in male-to-female transsexuals. Curr Opin Otolaryngol Head Neck Surg 2012;20:165–70. pmid:22487788
- 16. Gelfer MP, Van Dong BR. A preliminary study on the use of vocal function exercises to improve voice in male-to-female transgender clients. J Voice 2013;27:321–34. pmid:23159032
- 17. Quinn S, Swain N. Efficacy of intensive voice feminisation therapy in a transgender young offender. J Commun Disord 2018;72:1–15. pmid:29454176
- 18. Dornelas R, da Silva K, Pellicani AD. Proposal of the vocal attendance protocol and vocal redesignation program in the services of the transsexualizing process. CODAS 2021;33:1–5. https://doi.org/10.1590/2317-1782/20202019188.
- 19. Russell MR, Abrams M. Transgender and Nonbinary Adolescents: The Role of Voice and Communication Therapy. Perspect ASHA Spec Interes Groups 2019;4:1298–305. https://doi.org/10.1044/2019_persp-19-00034.
- 20. Hancock A, Helenius L. Adolescent male-to-female transgender voice and communication therapy. J Commun Disord 2012;45:313–24. pmid:22796114
- 21. Hancock AB, Garabedian LM. Transgender voice and communication treatment: a retrospective chart review of 25 cases. Int J Lang Commun Disord 2013;48:54–65. pmid:23317384
- 22. Cárdenas Y, Campo C, Fernández V, Escobedo J, Inchuchala J, Delgado JP, et al. Intervención fonoaudiológica para la feminización de la voz en una persona transgénero MTF: estudio de caso TT—Phoniatric intervention for voice feminization in a transgender person MTF: a case study. Rev chil fonoaudiol (En línea) 2019;18:1–15.
- 23. Gray ML, Courey MS. Transgender Voice and Communication. Otolaryngol Clin North Am 2019;52:713–22. pmid:31101356
- 24. Mills M, Stoneham G, Davies S. Toward a Protocol for Transmasculine Voice: A Service Evaluation of the Voice and Communication Therapy Group Program, Including Long-Term Follow-Up for Trans Men at the London Gender Identity Clinic. Transgender Heal 2019;4:143–51. https://doi.org/ http://dx.doi.org/10.1089/trgh.2019.0011.
- 25. Gelfer MP, Tice RM. Perceptual and acoustic outcomes of voice therapy for male-to-female transgender individuals immediately after therapy and 15 months later. J Voice 2013;27:335–47. pmid:23084812
- 26. Kawitzky D, McAllister T. The Effect of Formant Biofeedback on the Feminization of Voice in Transgender Women. J Voice 2020;34:53–67. pmid:30174221
- 27. Martinho DHC, Constantini AC. Auditory-Perceptual Assessment and Acoustic Analysis of Gender Expression in the Voice. J Voice 2024:1–7. https://doi.org/10.1016/j.jvoice.2023.12.024.
- 28. Munson B. The acoustic correlates of perceived masculinity, perceived femininity, and perceived sexual orientation. Lang Speech 2007;50:125–42. pmid:17518106
- 29. Boyd Z, Fruehwald J, Hall-Lew L. Crosslinguistic perceptions of/s/among English, French, and German listeners. Lang Var Change 2021;33:165–91. https://doi.org/10.1017/S0954394521000089.
- 30. Quinn S, Oates J, Dacakis G. Perceived Gender and Client Satisfaction in Transgender Voice Work: Comparing Self and Listener Rating Scales across a Training Program. Folia Phoniatr Logop 2022;74:364–79. pmid:34847562
- 31. Hancock A, Colton L, Douglas F. Intonation and gender perception: Applications for transgender speakers. J Voice 2014;28:203–9. pmid:24094799
- 32. Houle N, Goudelias D, Lerario MP, Levi S V. Effect of Anchor Term on Auditory-Perceptual Ratings of Feminine and Masculine Speakers. J Speech, Lang Hear Res 2022;65:2064–80. pmid:35452247
- 33. Hancock AB, Pool SF. Influence of Listener Characteristics on Perceptions of Sex and Gender. J Lang Soc Psychol 2017;36:599–610. https://doi.org/10.1177/0261927X17704460.
- 34. Hardy TLD, Rieger JM, Wells K, Boliek CA. Acoustic Predictors of Gender Attribution, Masculinity–Femininity, and Vocal Naturalness Ratings Amongst Transgender and Cisgender Speakers. J Voice 2020;34:300.e11–300.e26. pmid:30503396
- 35. Hancock AB, Hao G, Ni A, Liu H, Johnson LW. Gender Attributions by Cisgender and Gender Diverse Listeners Rating Vowels, Reading, and Monologues. J Voice 2023:1–7. https://doi.org/10.1016/j.jvoice.2023.09.011.
- 36. Boersma P, Weenink D. Praat: doing phonetics by computer. 2022.
- 37.
ANDRADE CD. O amor bate na aorta. Antol. poética Carlos Drummond Andrade. 1st ed., São Paulo: Companhia das Letras; 2012, p. 17.
- 38. Hincks R. Measures and perceptions of liveliness in student oral presentation speech: A proposal for an automatic feedback mechanism. System 2005;33:575–91. https://doi.org/10.1016/j.system.2005.04.002.
- 39. Niebuhr O, Voße J, Brem A. What makes a charismatic speaker? A computer-based acoustic-prosodic analysis of Steve Jobs tone of voice. Comput Human Behav 2016;64:366–82. https://doi.org/10.1016/j.chb.2016.06.059.
- 40. Bachorowski J-A, Owren MJ. Vocal Expression of Emotion: Acoustic Properties of Speech Are Associated With Emotional Intensity and Context. Psychol Sci 1995;6:219–24. https://doi.org/10.1111/j.1467-9280.1995.tb00596.x.
- 41. Liénard J-S, Di Benedetto M-G. Effect of vocal effort on spectral properties of vowels. J Acoust Soc Am 1999;106:411–22. pmid:10420631
- 42. Wayland R, Gargash S, Longman A. Acoustic and perceptual investigation of breathy voice. J Acoust Soc Am 1995;97:3364–3364. https://doi.org/10.1121/1.413011.
- 43. Kreiman J, Shue Y-L, Chen G, Iseli M, Gerratt BR, Neubauer J, et al. Variability in the relationships among voice quality, harmonic amplitudes, open quotient, and glottal area waveform shape in sustained phonation. J Acoust Soc Am 2012;132:2625–32. pmid:23039455
- 44. Lopes LW, Sousa ES da S, da Silva ACF, da Silva IM, de Paiva MAA, Vieira VJD, et al. Cepstral measures in the assessment of severity of voice disorders. Codas 2019;31:1–8. https://doi.org/10.1590/2317-1782/20182018175.
- 45. Barsties B, Maryn Y. External Validation of the Acoustic Voice Quality Index Version 03.01 with Extended Representativity. Ann Otol Rhinol Laryngol 2016;125:571–83. pmid:26951063
- 46. Barsties v. Latoszek B, Maryn Y, Gerrits E, De Bodt M. The Acoustic Breathiness Index (ABI): A Multivariate Acoustic Model for Breathiness. J Voice 2017;31:511.e11–511.e27. pmid:28087124
- 47. Whiteside SP. Temporal-based acoustic-phonetic patterns in read speech: some evidence for speaker sex differences. J Int Phon Assoc 1996;26:23–40. https://doi.org/10.1017/S0025100300005302.
- 48. Hejná M, Šturm P, Tylečková L, Bořil T. Normophonic Breathiness in Czech and Danish: Are Females Breathier Than Males? J Voice 2021;35:498.e1–498.e22. pmid:31902679
- 49. Negesse F. An acoustic analysis of Oromo vowels of the northern dialect. J Int Phon Assoc 2023;53:1033–48. https://doi.org/10.1017/S0025100323000014.
- 50. Englert M, Lima L, Behlau M. Acoustic Voice Quality Index and Acoustic Breathiness Index: Analysis With Different Speech Material in the Brazilian Portuguese. J Voice 2020;34:810.e11–810.e17. pmid:31005448
- 51. Barbosa PA. ProsodyDescriptorNew 2016.
- 52. Mark-pauses Lennes M. Praat Scr Resour 2005. http://phonetics.linguistics.ucla.edu/facilities/acoustic/praat.html (accessed July 19, 2022).
- 53. Batthyany C, Latoszek BB V., Maryn Y. Meta-Analysis on the Validity of the Acoustic Voice Quality Index. J Voice 2022. pmid:35752532
- 54. Jayakumar T, Benoy JJ. Acoustic Voice Quality Index (AVQI) in the Measurement of Voice Quality: A Systematic Review and Meta-Analysis. J Voice 2022. pmid:35461729
- 55. Behlau M, Almeida AA, Amorim G, Balata P, Bastos S, Cassol M, et al. Reducing the GAP between science and clinic: lessons from academia and professional practice—part A: perceptual-auditory judgment of vocal quality, acoustic vocal signal analysis and voice self-assessment. CoDAS 2022;34:1–12. https://doi.org/10.1590/2317-1782/20212021240en.
- 56. Agana MG, Greydanus DE, Indyk JA, Calles JLJ, Kushner J, Leibowitz S, et al. Caring for the transgender adolescent and young adult: Current concepts of an evolving process in the 21st century. Dis Mon 2019;65:303–56. pmid:31405516
- 57. Azul D. Transmasculine people’s vocal situations: a critical review of gender-related discourses and empirical data. Int J Lang & Commun Disord / R Coll Speech & Lang Ther 2015;50:31–47. pmid:25180865
- 58.
ECKERT P, MCCONNELL-GINET S. Linguistic resources. Lang. Gend. 2nd ed, New York: Cambridge University Press; 2013, p. 62–81.
- 59. Easton C, Verdon S. The Influence of Linguistic Bias Upon Speech-Language Pathologists’ Attitudes Toward Clinical Scenarios Involving Nonstandard Dialects of English. Am J Speech-Language Pathol 2021;30:1973–89. pmid:34463535
- 60. Evans KE, Munson B, Edwards J. Does Speaker Race Affect the Assessment of Children’s Speech Accuracy? A Comparison of Speech-Language Pathologists and Clinically Untrained Listeners. Lang Speech Hear Serv Sch 2018;49:906–21. pmid:29971346
- 61. Oates J. Auditory-perceptual evaluation of disordered voice quality: Pros, Cons and Future Directions. Folia Phoniatr Logop 2009;61:49–56. pmid:19204393
- 62. Bensoussan Y, Pinto J, Crowson M, Walden PR, Rudzicz F, Johns M 3rd. Deep Learning for Voice Gender Identification: Proof-of-concept for Gender-Affirming Voice Care. Laryngoscope 2021;131:E1611–5. pmid:33219707
- 63. Buyukyilmaz M, Cibikdiken AO. Voice Gender Recognition Using Deep Learning 2016;58:409–11. https://doi.org/10.2991/msota-16.2016.90.
- 64. Belin P, Fecteau S, Bédard C. Thinking the voice: Neural correlates of voice perception. Trends Cogn Sci 2004;8:129–35. pmid:15301753
- 65. Beauchemin M, González-Frankenberger B, Tremblay J, Vannasing P, Martínez-Montes E, Belin P, et al. Mother and stranger: An electrophysiological study of voice processing in newborns. Cereb Cortex 2011;21:1705–11. pmid:21149849
- 66.
Kreiman J, Sidtis D. Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception. John Wiley & Sons; 2011.
- 67. Pang W, Xing H, Zhang L, Shu H, Zhang Y. Superiority of blind over sighted listeners in voice recognition. J Acoust Soc Am 2020;148:EL208–13. pmid:32873006
- 68. Cochran BN, Peavy KM, Cauce AM. Substance abuse treatment providers’ explicit and implicit attitudes regarding sexual minorities. J Homosex 2007;53:181–207. pmid:18032292
- 69. Nadal KL, Whitman CN, Davis LS, Erazo T, Davidoff KC. Microaggressions Toward Lesbian, Gay, Bisexual, Transgender, Queer, and Genderqueer People: A Review of the Literature. J Sex Res 2016;53:488–508. pmid:26966779