Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Perceived Masculinity Predicts U.S. Supreme Court Outcomes

  • Daniel Chen ,

    Contributed equally to this work with: Daniel Chen, Yosh Halberstam, Alan C. L. Yu

    Affiliation Institute for Advanced Study, Toulouse School of Economics, Toulouse, France

  • Yosh Halberstam ,

    Contributed equally to this work with: Daniel Chen, Yosh Halberstam, Alan C. L. Yu

    Affiliation Department of Economics, University of Toronto, Toronto, Ontario, Canada

  • Alan C. L. Yu

    Contributed equally to this work with: Daniel Chen, Yosh Halberstam, Alan C. L. Yu

    Affiliation Phonology Laboratory, Department of Linguistics, University of Chicago, Chicago, Illinois, United States of America

Perceived Masculinity Predicts U.S. Supreme Court Outcomes

  • Daniel Chen, 
  • Yosh Halberstam, 
  • Alan C. L. Yu


Previous studies suggest a significant role of language in the court room, yet none has identified a definitive correlation between vocal characteristics and court outcomes. This paper demonstrates that voice-based snap judgments based solely on the introductory sentence of lawyers arguing in front of the Supreme Court of the United States predict outcomes in the Court. In this study, participants rated the opening statement of male advocates arguing before the Supreme Court between 1998 and 2012 in terms of masculinity, attractiveness, confidence, intelligence, trustworthiness, and aggressiveness. We found significant correlation between vocal characteristics and court outcomes and the correlation is specific to perceived masculinity even when judgment of masculinity is based only on less than three seconds of exposure to a lawyer’s speech sample. Specifically, male advocates are more likely to win when they are perceived as less masculine. No other personality dimension predicts court outcomes. While this study does not aim to establish any causal connections, our findings suggest that vocal characteristics may be relevant in even as solemn a setting as the Supreme Court of the United States.


Voice-based first impressions can be formed rapidly with very brief exposure (less than half a second of speech [14]) and such impressions often are associated with subsequent behavior of the perceiver [57]. For example, voice-based personality judgments are associated with mate selection [8], leader election [9, 10], housing options [11], consumer choices, and jury decision [12]. Although researchers have demonstrated how vocal perception influences the communication process [13], it remains unclear whether such influences find resonances in a communicative setting like oral arguments at the Supreme Court of the United States (SCOTUS), where subtle biases have consequences for major policy outcomes. To be sure, previous studies suggest a significant role for linguistic cues in the court room [12, 14, 15], yet none has identified a definitive connection between voice perceptions and actual court outcomes.

A priori, there are many reasons why inferences from voice should not play an important role in Supreme Court decisions. From a rational perspective, information about the advocate should override any first impression. From an ideological perspective, court outcomes are largely predetermined. From a judge’s legal perspective, decisions are justified not in terms of the advocate’s voice but in terms of the legal content of the argument. And from an economic perspective, correlations between malleable advocate characteristics and high-stakes outcomes in the United States Supreme Court should not persist as law firms and advocates are likely to adjust their behavior to eliminate such correlations.

At the same time, from a behavioral perspective, however, it has been repeatedly shown that the way one speaks reveals a lot about one’s personality, level of confidence, as well as ethnicity, socio-economic circumstances, geographic background, sexuality, and ideological stance [8, 1618]. The identification of African American speakers can be made even on the basis of the single word “hello” [11]. The percept of gay male speech and/or feminine male speech is linked to vowel formant structure [19], pitch [20] and the length and quality of /s/ [21, 22]. The released variant of word-final /t/ may be used as a resource of constructing nerd identity among female nerds [23], learnedness among Orthodox Jewish men [24], gayness [25] and articulateness among US politicians [26]. To be sure, listeners’ interpretations of the meanings behind these linguistic cues might vary according to the listener’s level of experience with different speech varieties [27] and the identity of the speaker [26]. Nonetheless, even when visual cues are present, potential employers rely more on voice-based impressions of a job applicant’s competence and intellect in making hiring decisions [28].

In this study, we examine the relationship between how people perceive the voice personalities/attributes of advocates arguing before the court and whether these perceptions can predict real outcomes. To this end, we utilize recordings of oral arguments of the Supreme Court of the United States, which offer a wealth of court decisions that have real world impact. Specifically, we focus on the introductory statement of an oral argument. During an oral argument, counsels representing the competing parties of a case (i.e., the advocates for the petitioner and the respondent) each present their sides to the Justices. As the introductory statement of an advocate’s argument before the court is customarily “Mister Chief Justice, (and) may it please the court”, the corpus of introductory statements we have amassed provides a unique opportunity for examining the effect of speech and language on real world outcomes since the lexical content (the words) being evaluated is identical across speakers. The listeners can therefore focus their judgments on how the words are pronounced, rather than on the word choice of the advocates.

Our empirical strategy is focused on testing models of cognitive bias. To infer the bias, we need to measure perceptions, which are typically unobserved, and how they relate to outcomes. Here, we focused on six dimensions, selected based on previous research on listener’s perceptual evaluations of linguistic variables [18, 29, 30]. These include masculinity, attractiveness, confidence, intelligence, trustworthiness, and aggressiveness. Masculine voices increase perception of dominance and fighting ability among men [31] and they increase attractiveness to women. Vote choices have also been shown to be influenced by perceptions of masculinity and femininity in male faces [32] and judgments about faces are shown to predict the outcomes of actual elections [32, 33]. Vocal attractiveness is often found to be linked to facial attractiveness [3436]. Judgments of attractiveness are important in everyday interaction as physically attractive people are found to be more persuasive [37] and judged to be more socially desirable and to get better jobs [38]. Confidence, trustworthiness, and aggressiveness are all important aspects of human communication, which can be processed upon one’s very first encounter with an individual [39, 40]. Trustworthiness may, at least partly, influence attribution of competence and might affect voting behavior [33]. It is also an important precursor in the development of cooperation [41] and a fundamental aspect of the legal system [42]. Expressions of confidence have been shown to affect persuasion [43]. Aggressiveness, which indexes a person’s assertiveness, also provides a means to counter the positive orientation of the dimensions considered. A person’s intelligence cannot be observed directly and must be inferred from indirect cues such as voice. Perceived intelligence has been found to affect an individual’s employability [28]. Listeners’ judgments along these dimensions are used as predictors of court outcomes. Given the exploratory nature of this study, it is worth emphasizing at the outset that it is not the goal of this study to advance any claims for any specific causal influence of voice on the SCOTUS outcomes. Rather, we aim to test whether people’s subjective voice-based trait judgments are predictive of the SCOTUS outcomes at all. To the extent that such correlations can be established, future studies will be needed to determine the causal mechanisms behind such relationships.

This article begins with detailing the materials and methodologies used in this study in Section 2. The results are reviewed in Section 3, followed by robustness checks and extensions in Section 4. A discussion of the general findings is given in Section 5.

Materials and Methods

Ethics Statement

The study was approved by the Social and Behavioral Sciences Institutional Review Board at the University of Chicago, including a wavier of informed consent as it was determined that the research presents no more than minimal risk to subjects and a waiver of informed consent would not adversely affect the rights and welfare of subjects.


The stimuli for this study were drawn from oral arguments made in the Supreme Court of the United States between 1998 and 2012. A novel feature of our data is the use of identical 2 to 3 seconds of content delivered at the outset of each argument: “Mr. Chief Justice, (and) may it please the Court”. Our data consist of 1634 oral arguments made by 916 distinct male advocates, where about 80 percent of these advocates argued only once in the Supreme Court.

Oral arguments at the Supreme Court have been recorded since the installation of a recording system in October 1955. The recordings and the associated transcripts were made available to the public in electronically downloadable format by the Oyez Project (, which is a multimedia archive at the Chicago-Kent College of Law devoted to the Supreme Court of the United States and its work. The audio archive contains more than 110 million words in more than 9000 hours of audio synchronized, based on the court transcripts, to the sentence level.

Oral arguments are, with rare exceptions, the first occasion in the processing of a case in which the Court meets face-to-face in consideration of the issues. Usually, counsels representing the competing parties of a case each has thirty minutes in which to present their side to the Justices. The Justices may interrupt these presentations with comments and questions, leading to interactions between the Justices, the lawyers and, in some cases, the amici curiae, who are not a party to a case but nonetheless offer information that bears on the case not solicited by any of the parties to assist the Court. While oral arguments have been recorded since 1955, with the exception of those between 1998 to 2012, the bulk of the transcripts available on the OYEZ archive at the time this experiment was set up did not identify the speaking turns of individual Justices, referring to them all as “The Court”. The archive has since diarized all recordings.


Participants from Amazon MechanicalTurk (AMT) rated the voice clips of the Supreme Court advocates. About half (321) of the 634 distinct participants who completed our survey were female. Two thirds of the participants aged between 20 and 35 years old and one third were older than 35. Likewise, one third indicated they had some college education, whereas one third claimed to have a bachelor’s degree. The median income of those who completed the survey was about 40,000 US dollars. The racial and geographical distribution of the participants broadly reflect that of the US population. The correlation between the share of participants from a given state and the state share of US population is 0.9588. Further descriptive statistics of the AMT participants who participated in this research are presented in Table 1.

Table 1. Descriptive Statistics of Survey Participants (N = 634).

This table presents descriptive statistics of survey participants who rated audio clips of Supreme Court oral arguments made by male advocates. The data are self-reported by participants before beginning the audio survey.


Participants were asked to rate the voice clips of Supreme Court advocates on a scale of 1 to 7 in terms of aggressiveness, attractiveness, confidence, intelligence, masculinity, and trustworthiness. As noted in the Introduction, these six dimensions were selected based on previous research on listener’s perceptual evaluations of linguistic variables [18, 29, 30]. Each voice clip was played aloud once automatically, but participants were allowed to replay the clip as many times as they chose; in another survey variant, each clip was played only once and participants were unable to replay the clip. We discuss this and other survey designs below. The order and polarity of the attributes were randomized across survey participants. For example, masculine would vary vertically along the 6 attributes, and very masculine and not at all masculine would vary from left to right as bounds on a 7-point scale. The order and the polarity of attribute scales were held fixed for any particular participant to minimize cognitive fatigue. Participants were also asked to predict whether the lawyer would win the case and to rate the quality of the audio recordings.

Each participant rated 66 voice recordings. Of these, 60 were randomly drawn from the audio clip sample pool, and 6 of these were repeated as recordings 61 to 66 to measure the consistency of participant ratings. The participants were asked to use headphones to listen to the recordings. Amici curiae were also rated among the advocates, but are excluded from this study. No information regarding the identity of the speaker or the nature of the case were given to the participants. In Fig 1, we present a screenshot of the survey ratings page. (See S2, S3, and S4 Figs for screenshots of other sections of the task.)

Fig 1. Survey filled by AMT participants.

This figure is a screenshot of the survey matrix used by AMT participants to record their impressions of the audio recordings of advocates. The order and polarity of attributes were randomized across participants. Participants were not able to proceed to the next recording without completing the survey matrix and questions.


This section lays out the general analytic framework we employed in this study. To operationalize our empirical analysis we begin by constructing a measure of voice-based trait judgments. Let attributeitw be participant w’s perception of a given attribute of advocate i in case t, where attribute refers to any one of the six traits. These untransformed scores (range = 1–7) give more weight to participants who provide more signal amid greater variance in their ratings. Thus, to be conservative, our preferred measure adjusts for cross-participant variability in the cardinality of ratings as well as spread. Formally, for each participant and voice attribute, the normalized rating is given by (1) where is the average perception of a given attribute across participant w’s advocate ratings and σ(attribute)w is the standard deviation of these ratings. As a result, for each participant w, is a continuous measure with mean equals to zero and standard deviation equals to one.

Using these measures, we estimate regression of the following form: (2) where the dependent variable is an indicator for whether advocate i actually won (= 1) or lost (= 0) case t, and the key independent variables denoted by the vector are continuous measures of the set of six attributes of the advocate in case t as perceived by participant w, as well the (normalized) perceived likelihood of winning. Given the regression equation, β represents the bias in actual wins associated with advocate traits. The vector is a set of advocate and participant covariates (described in Table 1) that we use to explore the influence of heterogenous perceptions of survey participants on our findings. These covariates include their age, gender, race, income, education and state of residence. To address the correlation in ratings among survey participants, we adjust the standard errors of the regression estimates for clustering at the oral argument level.

For comparison purposes and for robustness, we also show baseline results using the untransformed scores as well as a collapsed version of the data, whereby we match only one voice measure to each oral argument by taking the average rating across participants for a given oral argument. In these regressions we lose variation in perceptions across participants. Broadly, these aggregated regressions mitigate the influence of classical measurement error that typically biases coefficient estimates toward zero. Additionally, using the collapsed data addresses any concern for mechanically increasing power by duplicating the number of oral arguments by the number of ratings per recording (even though we cluster at the recording-level in all regressions). On the other hand, aggregated regressions can lose precision because we also can no longer control for rater-specific correlations across perceptual ratings and participant characteristics. For these reasons, the aggregated regression is generally viewed as too conservative in terms of statistical precision [44]. For sake of completeness, we provide baseline results using the collapsed data as well.

We use the linear probability model (OLS) as our primary estimation method, and show that our results are robust to the use of probit and logistic models. There are two main reasons for this choice. The first is that our objective is to estimate the correlation coefficients between perceived attributes of advocates and case outcomes rather than to develop a forecasting model of case outcomes, and OLS is superior for estimation purposes. And second, probit and logit are not well-suited to the use of regressions with controls for fixed effects (e.g., dummies for lawyer, participant, year of case argued, etc.) because of the incidental parameters problem [45], and our analysis includes many regressions with controls for fixed effects.


Our procedure produced 33,666 ratings, with approximately 20 ratings for each of the 1634 oral arguments made by male advocates. The total number of observations generated by AMT was 41,844 = 66 ratings x 634 participants. However, ratings of amici curiae as well as ratings by 31 participants that did not vary across recordings were excluded from analysis. The final dataset we use in this paper can be downloaded at: Table 2 provides summary statistics of the normalized voice ratings. As expected, the mean normalized rating across participants is approximately zero with a standard deviation of one.

Table 2. Summary Statistics of Case Outcome and Trait Judgements of Male Lawyers (N = 33,666).

This table presents summary statistics of participant normalized ratings of our sample of 1634 oral arguments. Each observation is an argument by participant rating. Case Outcome is an indicator for whether the advocate won the case (= 1) in court or lost (= 0).

Throughout this paper, we refer to empirical findings only if they are statistically significant at the 5 percent level. We begin our analysis by exploring correlations among attribute ratings as well as correlations with the case outcome. In Table 3, we present a correlation table using the normalized ratings. As seen, the ratings are positively correlated across attributes, with confident and aggressive most correlated (ρ = 0.497) and trustworthy and aggressive least correlated (ρ = 0.102). Likewise, all attributes are positively correlated with the perceived likelihood of winning the case (e.g., advocates with voices perceived as more aggressive are also seen as more likely to win).

Table 3. Correlations in Case Outcome and Trait Judgements of Male Lawyers (N = 33,666).

This table presents correlations in participant normalized ratings and case outcomes. Each observation is an argument by participant rating. Case Outcome is = 1 if advocate won the case, and = 0 if advocate lost. Bonferroni-adjusted p-values in parentheses.

In contrast, only masculinity is correlated with real outcomes (ρ = −0.02). To illustrate, we present a non-parametric plot of this correlation in Fig 2. In this figure, the normalized masculinity ratings are grouped into 20 equally sized bins with each point representing the share of cases won for observations in that bin. Notably, the slope between wins and masculinity is negative with a 5 percentage point difference in the likelihood of winning between advocates perceived as most and least masculine. We examine the robustness of this correlation in a regression framework that follows.

Fig 2. Advocate Masculinity and Court Outcomes.

Binned scatterplots illustrating the association between voice-based masculinity ratings and court outcomes. Binned scatterplots are a non-parametric method of plotting the conditional expectation function (which describes the average y-value for each x-value). Ratings are sorted into twenty quantiles with each point in the figure indicating the share of oral arguments won for a given ratings bin. The figure reflects the correlation between normalized ratings of masculinity and case outcomes of male advocates.

Baseline Results

We begin by examining the relationship between voice-based perceptions of advocates and whether these perceptions can predict case outcomes. Focusing on our full sample of male advocates, the baseline results of estimating Eq (2) are presented in Table 4. As a starting point, we show OLS regression results using four different measures of attributes: normalized, untransformed, collapsed normalized and collapsed untransformed, where the collapsed measure is computed by collapsing the data to the mean attribute rating per audio clip. For each of these measures, we estimate two regression specifications, one excluding and one including lawyer fixed effects. The latter specification aims to approximate the relative correlation that stems from within-lawyer variation versus between-lawyer variation.

Table 4. OLS Baseline Results: Male Advocates.

This table presents coefficient estimates from OLS regressions using data on Supreme Court oral arguments made by male advocates. The dependent variable is an indicator for whether the advocate won the case or not. Independent variables are voice-based ratings of advocate attributes made by survey participants, where untransformed ratings are integers ranging from 1 to 7 and normalized ratings are z-scored by participant. In columns 1-4, the unit of analysis is individual rating by oral argument, and in columns 5-8, the unit of analysis is oral argument average rating. Lawyer dummies are included where noted. Standard errors in parentheses are clustered by oral argument.

Starting with columns (1)-(2) of Table 4, we show results using the normalized ratings. Masculine is significantly correlated with outcomes in the regression controlling for lawyer fixed effects. No other attributes are correlated with outcomes. The estimate from column (2) suggests that one standard deviation change in masculinity, for a given lawyer, is associated with a 0.9 percentage point change in case outcomes. Columns (3)-(4) repeat the same regressions using the untransformed scores, where each rating is an integer between 1 to 7. In the regression without lawyer fixed effects, both intelligent and masculine are correlated with case outcomes, but with the inclusion of lawyer fixed effects only masculine remains significant. Since the standard deviation of masculine using the untransformed scores is approximately 1.5, the correlation magnitudes are comparable to those obtained using the normalized scores. Running this set of regressions with the collapsed data yields little further insight. The only significant coefficient in columns (5)-(8) is the one on masculine in column (8), the specification using the collapsed untransformed measures with lawyer fixed effects.

Taken together, in half the regression specifications there is evidence for a correlation between masculine and outcomes. This partial pattern motivates further inquiry. As for the other attributes, we find no correlations except for intelligence in 1 of the 8 regressions. Likewise, participants are poor at predicting court outcomes based on the voice stimuli alone.

Petitioners versus Respondents

Under a hypothesis of the primacy of first impressions on court decisions, the first person to argue in front of the Justices should exhibit a stronger vocal first impression effect. That is, the first speaker may have a longer lasting impact on the court and subsequent outcomes, a hypothesis we derive from the anchoring effect [46], where individuals rely on an initial piece of information to make subsequent judgments. As the advocates for the petitioner always argue before the advocates for the respondent at the Supreme Court, we examine the robustness of the association between perceived masculinity and court outcomes separately for the petitioners and respondents. We report the results of our analysis in Table 5.

Table 5. OLS Results: Male Petitioners versus Respondents.

This table presents coefficient estimates from OLS regressions using data on Supreme Court oral arguments made by male advocates. Columns 1-5 (6-10) use data on oral arguments made by advocates for the petitioner (respondent). The dependent variable is an indicator for whether the advocate won the case or not. Independent variables are voice-based ratings of advocate attributes normalized by survey particiapnt. Lawyer and participant dummies are included where noted. Participant controls are age and dummies for each category given in the biographical questionnaire. Standard errors in parentheses are clustered by oral argument.

Indeed, the key observation is that the correlation between perceived masculinity and outcomes persists for petitioners but not for respondents. The correlation is robust across multiple specifications using a combination of participant controls and lawyer and participant fixed effects. Focusing on the subsample of arguments made by advocates for petitioners, in column (1) of Table 5, the baseline regression, the coefficient estimate suggests a 2 percentage points increase in case wins associated with one standard deviation decrease in masculine. In column (2), we show that this estimate is robust to the inclusion of participant fixed effects. This means that the correlation between perceived masculinity and real outcomes is not driven by any subset of survey participants. Put differently, this specification excludes cross-participant variation in ratings, such that the results are driven solely by variation in participant ratings of the random set of 66 audio clips. In column (3), we examine the correlation within-lawyer by including lawyer fixed effects. The estimate on masculine is 0.007 suggesting that about 1/3 of the correlation between masculinity and court outcomes is driven by variation in oral arguments made by the same advocate versus 2/3 that is driven by variation in arguments made by different advocates. To illustrate these last results, we provide a nonparametric plot of the residuals obtained from regressing case outcomes on the set of fixed effects and attributes excluding masculine against the masculine ratings. In Fig 3, we provide the residual plots reflecting columns (2) and (3). For example, the lefthand side plot, which parallels the within-participant regression in column (2), shows a difference of approximately 8 percentage points in winning between oral arguments made by advocates perceived as least and most masculine.

Fig 3. Petitioner Masculinity and Court Outcomes.

Binned scatterplots illustrating the association between voice-based masculinity rating and court outcomes. Binned scatterplots are a non-parametric method of plotting the conditional expectation function (which describes the average y-value for each x-value). The figures are residual plots of the regressions presented in columns 2 (left) and 3 (right) of Table 5, excluding the masculine independent variable. The lefthand (righthand) side figure plots residuals net of survey participant (lawyer) dummies. Ratings are sorted into twenty quantiles with each point in the figure indicating the mean residual for a given ratings bin.

To control for the possibility that participants with certain characteristics are driving the results, we further expand our analysis by including participant characteristics. Specifically, we include controls for participant age, and dummies for each racial group, gender, income cohort, education level and state of residence (see Table 1). Column (4) in Table 5 presents regression results that includes this set of participant controls in addition to lawyer fixed effects, column (5) substitutes these participant controls with participant fixed effects. Point estimates on masculine remain similar and significant in these specifications. No other coefficient estimates of attributes are significant in this set of petitioner regressions.

Turning to the respondent regressions, we do not find any of the attributes to be correlated with case outcomes. To the extent that we do find significant results, these are limited to the two regression specifications that leverage between-advocate variation (columns (1)-(2)), where perceptions of winning are negatively correlated with actually winning. We do not focus on these results, given that this correlation (a) is not specific to an attribute, (b) does not persist across regression specifications and (c) does not have support in the baseline regressions presented in Table 4. We also find no further support in these regressions for intelligence as a possible correlate of outcomes.

To sum, we find robust evidence for a correlation between case outcomes and voice-based perceptions of advocate masculinity for petitioners. No association between perceived masculinity and court outcomes is found among lawyers of the respondents. This finding supports the hypothesis that first impression, in this case, of the first lawyer to argue before the Justices, exhibits a disproportionate association with judicial decisions.

Robustness and Extensions

In this section, we expand our analysis in a number of directions, including robustness to sample, ratings, and model variations.

Given our findings that, even once removing cross-advocate variation, the negative correlation between perceptions of masculinity and court outcomes persists, we examine more closely whether our results are driven by cases argued in a certain year or by advocates with a certain degree of experience in arguing cases at the Supreme Court. To do this, we compare our baseline regression results for petitioners (column (1) in Table 5) to the regression results in Table 6. By including year fixed effects, column (1) in Table 6 addresses whether our findings are driven by a certain set of cases in our sample of oral arguments. Similarly, column (2) includes fixed effects for the number of oral arguments in our sample made by the same lawyer, which we take as a proxy for experience. In both specifications, the estimate on masculine remains significant but is slightly smaller in magnitude (1.7 versus 2 percentage points in the baseline regression). Given this, we can rule out that cohort, or time effects are significantly influencing our findings.

Table 6. Robustness Checks: Male Petitioners.

This table presents coefficient estimates from regressions using data on Supreme Court oral arguments made by male advocates for the petitioner. The dependent variable is an indicator for whether the advocate won the case or not. Independent variables are voice-based ratings of advocate attributes normalized by survey particiapnt. Columns 1-2 report coefficient estimates using OLS with dummies for year of argument and number of cases argued by the lawyer where noted. Columns 3-4 report coefficient estimates using OLS where ratings that exceed the Mahalanobis distance of are omitted in column 3, and ratings by survey participants with scores in the top quintile on a measure of rating inconsistency are omitted in column 4 (see S1 Table). Columns 5-6 report baseline probit (logistic) regression results with marginal effects calculated at the means of the independent variables. Standard errors in parentheses are clustered by oral argument.

We next examine how our results change if we remove ratings that can be deemed as outliers. The first method to identify such outliers is by computing the Mahalanobis distance (MD) for ratings given by each participant for each audio clip. We then run the baseline regression excluding ratings that exceed the critical value associated with a 2.5 percent significance level, about 15 percent of our ratings. Column (3) in Table 6 shows the regression results excluding these ratings. The estimate on masculine is significant and slightly larger: one standard deviation increase in masculinity is associated with 2.2 percentage points decrease in winning. A second method we use to identify outliers is based on examining ratings on the set of 6 repeat audio clips. For each participant we computed a consistency score defined as the average absolute difference in attribute ratings on the set of identical audio clips. The mean (and median) consistency score across participants and attributes is approximately one (further details are available in S1 Table). In column (4), we present regression results excluding ratings by a 1/5 of participants with the worst consistency scores. As seen, the association between perceived masculinity and outcomes remains similar to the one in the baseline regression. We take these results to indicate that our findings are likely to be stronger if we were to carefully screen out ratings by participants who may have misunderstood or exerted insufficient effort on the task.

In the final set of regressions, we show that our baseline estimates are robust to estimation method. In columns (5) and (6) of Table 6, we report estimates of marginal effects derived from applying a probit and logistic regression, respectively. In both cases, the estimate on masculine is nearly identical to the one we obtained using OLS.

To examine whether the ratings we gathered are specific to our procedure, we varied the survey design on a subsample of 60 voice clips. Instead of the basic design where the listener is presented with one voice sample and rates the sample on all attributes, the participants were randomly assigned to rate only one attribute for each recording, thus obviating the potential of cross-attribute influence on each other for a given voice clip and also to control for the possibility of within-voice modeling by participants. The key difference between this survey and our main survey depicted in Fig 1 is that only one attribute, selected at random for each voice recording, appeared in question 1. While there are slight differences in ratings across surveys, the results are very similar suggesting further robustness of our key findings on the connection between voice-based trait judgments of advocates and Supreme Court outcomes. We illustrate the high degree of correlation in perceptions across surveys (S1 Fig) in the Supplemental Information (SI).

Likewise, for this same subsample of 60 voice clips we were able to collect detailed information regarding the biographical characteristics of the advocates. Specifically, these include age, law school, whether the advocate was a member of the law review, had an additional graduate degree, was a Supreme Court clerk, and the total number of clerkships the advocate had. We found that including these covariates in a regression increased the precision of the estimate on masculine (see S2 Table). Overall, we acknowledge that we are unable to make far reaching conclusions from these regressions given the small sample size; however, if perceptions of masculinity were simply reflecting other important advocate covariates, then the coefficient estimates on masculine should be driven to zero. That this is not the case suggests that the channel of how trait judgments stemming from an extremely brief voice clip predict outcomes may not be as simple as one might expect. Likewise, our results are unlikely to be driven by any specific choice of number of ratings or survey framing. In sum, these findings are unlikely to be driven by spurious correlations or measurement error and provide further credence to the notion that snap judgments that stem from even 3-second voice samples can influence listeners beliefs about those they face and subsequent actions.

Finally, it is worth noting that only about 15 percent of the advocates who argued in the Supreme Court during the time period of our study were female. The gender-specificity of our findings is a question that warrants further investigation, especially since studies on voice-based social biases observed significant differences in how listeners react to voices of different perceived gender [47]. However, due to the lack of statistical power, we leave this question for future studies with an expanded female advocate dataset. Relatedly, we explored whether perceptions differ by gender of survey participant and whether such differences could affect how the perceived attributes of male advocates predict case outcomes. While we found some differences in ratings (most notably, female participants, more than male, perceive masculine advocates as more intelligent), we did not find these to play a role in our key finding on the relationship between voice-based perceptions of masculinity and outcomes in the Supreme Court.


To the best of our knowledge, this is the first study documenting an association between voice-based impressionistic judgments and judicial decisions. To benchmark our findings, the 2 percentage point difference in court outcomes attributed to one standard deviation change in perceived masculinity is equivalent to more than 1/2 of the gender gap (i.e., in our sample, male lawyers are 3.7 percentage points more likely to win a court case than female lawyers). These associations are comparable to effects of other external factors that have been shown to influence judicial behavior. For example, asylum judges are 2 percentage points more likely to deny asylum to refugees if their previous decision granted asylum [48]. Likewise, asylum judges are roughly 2 percentage points more likely to grant asylum on the day after a home-city Sunday football game win instead of a loss [49]. In a similar vein, U.S. District judges are a 0.3 percentage point less likely to assign any prison length in criminal sentencing cases after a home-city football game win instead of a loss [49]. More generally, judges’ demographic background characteristics, such as gender, race, and in particular, party of appointing president [50, 51], especially before elections [52], have all been shown to correlate with their decision-making over a range of legal issues.

Our findings echo earlier research documenting associations between voice-based personality judgments and human behavior. For instance, previous studies have found vocal attractiveness to be an important social evaluation linked to mate selection and sexual behavior [35] and masculine voices to be linked to dominance [31] and men’s threat potential in forager and industrial societies [53]. This type of association extends beyond evolutionary implications and may affect immediate real world consequences. Perceived intelligence, for example, has been found to affect an individual’s employability [28]. Landlords are found to discriminate against prospective tenants on the basis of the sound of their voice during telephone conversations [11]. Perceived task-ability, dominance, and sociability are found to show the strongest correlation with perceived influence in simulated juries [12]. Thus, the association between voice-based personality and court outcomes observed in this study further strengthens the importance of understanding how (and why) voice-based judgments influence human behavior.

To be sure, what is still in need of further exploration is the specific nature of the association between voice judgments and court outcomes. That is, why are court outcomes correlated with perceived masculinity but not other attributes? It is worth noting that the focus on language and gender in the court room is not new. However, previous studies have focused primarily on the gendered language performance of witnesses [54] or the discursive practices in the courtroom [55]. To the best of our knowledge, no studies have focused on vocal characteristics of the lawyers per se. More specifically, given that the attributes are positively correlated with each other, the fact that only perceived masculinity is found to correlate with court outcomes suggests that masculinity captures particular variance that is not captured by the other ratings. In a similar study where subjects were presented with faces of electoral candidates and were asked to rate the candidates’ perceived attributes, such as competence, intelligence, leadership, honesty, trustworthiness, charisma, and likability of candidates [33], only perceptions of competence predicted election outcomes. Our findings are similar in that, while perceived masculinity correlated with judgments of other voice attributes, perceived masculinity is the only one that predicts court outcomes in a consistent and robust manner.

Concerning the nature of the perceived attribute itself, masculinity is a quality or set of practices that is stereotypically, though not exclusively, connected with men. Women may engage in masculine practices equally as much, although such practices are either not noticed or censured [56]. The performative nature of “masculinity” made possible the existence of non-masculine men and masculine women [5659]. Different cultures may also construct different notions of masculinity. These differences are reflected in the stereotypical ways of talking and thinking about men and masculinities. In the US, there are four main cultural discourses of masculinity [56]: gender difference, which pertains to categorical difference in biology and behavior between men and women; heterosexism, which sees being masculine as to sexually desire women and not men; dominance, which links masculinity with notions of authority or power; and male solidarity, which assumes as given a bond among men.

In the present context, the fact that court outcomes are negatively associated with masculinity points to a possible connection with the discourse of dominance. That is, lawyers who are perceived to be more masculine might be construed as being more dominant and authoritative. To what extent these constructs, as distinct from perceived confidence and perceived aggressiveness, play a role in the decision process as judges deliberate court decisions will have to be explored further in future work. This work only establishes an association and does not attempt to advocate a particular causal relationship between these variables. To be sure, gendered differentiation of masculine and feminine language has been argued to have different evolutionary basis [60]. Males are seen as being selected to be aggressive and dominant, but this selective pressure might be a double-edged sword since aggressive and dominant behaviors would lead to lethal confrontation. In the present context, the dominant and aggressive stands of masculine-sounding lawyers might have invited an adverse response from the Court.

Given our research design, our findings do not allow us to conclude if the Justices were engaging in some form of linguistic profiling in making their judicial decision per se. Do lawyers change their voices across oral arguments in a manner predicted by case characteristics? Do law firms engage in some form of linguistic profiling in choosing their oral advocates? Further investigation should yield fruitful insights into the mechanisms underlying the associations between voice-based masculinity and court outcomes.

In sum, our results contribute to a growing literature on the relevance of extraneous factors in courtrooms. That is, although judicial behavior is widely assumed to be governed by legal doctrine [61], where judges are strictly hewing to legal doctrine and court precedent in making their decisions, the judge’s decision can be affected by the judge’s policy preferences [62], self-interest [63], and in the present case, potential voice-based snap judgments regarding lawyer personality. Future studies will hopefully elucidate the mechanisms behind these extraneous factors in the courtroom.

Supporting Information

S1 Fig. Correlation in Ratings across Survey Designs (collapsed).

This figure plots the mean untransformed rating for each of the 60 audio clips selected from our sample for further robustness checks. The x-axis reflects mean ratings obtained from participants in our main survey who were asked to rate each advocate on the full set of attributes, whereas the y-axis reflects the mean ratings obtained from participants in an alternative survey who were randomly assigned to rate each advocate on only one attribute at a time.


S1 Table. Participant Ratings Consistency (N = 748).

This table presents descriptive statistics of a measure of consistency in participant ratings using data on the random set of 6 audio clips that were duplicated for each participant. For each participant, the consistency measure is defined as the averge absolute difference in ratings of a given attribute between the duplicate clips: .


S2 Table. Robustness Checks on Sample of 60 Clips.

This table presents coefficient estimates from OLS regressions using data on a select sample of Supreme Court oral arguments made by male advocates. The dependent variable is an indicator for whether the advocate won the case or not. Independent variables are voice-based ratings of advocate attributes normalized by survey participant. Column 1 reports basline regression results, column 2 reports results from a specification that includes lawyer biographical controls: age, number of clerkships, and dummies for whether the advocate attended an elite law school, has a second graduate degree, served on law review or as a Supreme Court clerk. Columns 3-4 compare regression results using alternative survey designs to the baseline results presented in column 1. Column 3 presents results from a survey of approximately 200 participants rating the set of 60 audio clips, and column 4 presents results using ratings obtained from a survey that randomly assigned only one attribute to each audio clip. a ratings of educatedness were included instead of aggressiveness in columns 3-4; b ratings of age were included instead of intelligence in column 4; †, *, and ** indicate significance at the 10 percent, 5 percent, and 1 percent levels, respectively.



We thank Michael Boutros, Katie Franich, Dennis Luo, Betsy Pillion, and Jacob Phillips for invaluable research assistance. We thank participants at the 14th Conference on Laboratory Phonology, University of Toronto, the annual meeting of the Linguistics Society of America, and Canadian Economic Association. The authors are listed in alphabetical order.

Author Contributions

  1. Conceptualization: DC YH AY.
  2. Data curation: DC YH AY.
  3. Formal analysis: DC YH AY.
  4. Funding acquisition: DC YH AY.
  5. Investigation: DC YH AY.
  6. Methodology: DC YH AY.
  7. Project administration: DC YH AY.
  8. Resources: DC YH AY.
  9. Supervision: DC YH AY.
  10. Visualization: DC YH AY.
  11. Writing – original draft: DC YH AY.
  12. Writing – review & editing: DC YH AY.


  1. 1. Bestelmeyer PEG, Rouger J, DeBruine LM, Belin P. Auditory adaptation in vocal affect perception. Cognition. 2010;117(2):217–223. pmid:20804977
  2. 2. Latinus M, Belin P. Perceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli. PLOS ONE. 2012;7(7):e41384. pmid:22844469
  3. 3. Perrachione TK, Tufo SND, Gabrieli JDE. Human Voice Recognition Depends on Language Ability. Science. 2011;333:595–596. pmid:21798942
  4. 4. Scharinger M, Monahan PJ, Idsardi WJ. You had me at Hello: Rapid extraction of dialect information from spoken words. Neuroimage. 2011;NeuroImage(56):2329–2338. pmid:21511041
  5. 5. Ambady N, Rosenthal R. Thin Slices of Expressive Behavior as Predictors of Interpersonal Consequences: A Meta-Analysis. Psychological Bulletin. 1992;111:256–274.
  6. 6. Antonakis J, Dalgas O. Predicting Elections: Child’s Play! Science. 2009;323(5918):1183. pmid:19251621
  7. 7. Mayew WJ, Venkatachalam M. The Power of Voice: Managerial Affective States and Future Firm Performance. Journal of Finance. 2012;67(1):1–43. Available from:
  8. 8. Nass C, Lee KM. Does computer-synthesized speech manifest personality? experimental tests of recognition, similarity-attraction, and consistency-attraction. Journal of Experimental Psychology: Applied. 2001;7:171–181. Available from: pmid:11676096
  9. 9. Klofstad CA, Anderson RC, Peters S. Sounds like a winner: voice pitch influences perception of leadership capacity in both men and women. Proceedings of the Royal Society of London B: Biological Sciences. 2012;279(1738):2698–2704. pmid:22418254
  10. 10. Tigue CC, Borak DJ, O’Connor JJM, Schandl C, Feinberg DR. Voice pitch influences voting behavior. Evolution and Human Behavior. 2012;33:210–216.
  11. 11. Purnell T, Idsardi W, Baugh J. Perceptual and Phonetic Experiments on American English Dialect Identification. Journal of Language and Social Psychology. 1999;18(10):10–30.
  12. 12. Scherer KR. Voice and speech correlates of perceived social influence in simulated juries. In: Scherer KR, Giles H, Clair RS, editors. The social psychology of language. London: Blackwell; 1979. p. 88–120.
  13. 13. Juslin PN, Scherer KR. Vocal expression of affect. In: Harrigan J, Rosenthal R, Scherer K, editors. The New Handbook of methods in nonverbal behavior research. Oxford, UK: Oxford University Press; 2005. p. 65–135.
  14. 14. Schubert JN, Peterson SA, Schubert G, Wasby SL. Dialect, Sex and Risk Effects on Judges Questioning of Counsel in Supreme Court oral Argument. In: Salter FK, editor. Risky Transactions: Trust, Kinship, and Ethnicity. Berghan Books; 2002. p. 304. Available from:,+Sex+and+Risk+Effects+on+Judges+Questioning+of+Counsel+in+Supreme+Court+oral+Argument&source=gbs_navlinks_s.
  15. 15. Danescu-Niculescu-Mizil C, Lee L, Pang B, Kleinberg J. Echoes of Power: Language Effects and Power Differences in Social Interaction. In: Proceedings of the 21st International Conference on World Wide Web. WWW’12. New York, NY, USA: ACM; 2012. p. 699–708. Available from:
  16. 16. Babel M, McGuire G, Cruz S. Perceived vocal attractiveness across dialects is similar but not uniform. In: Proceedings of Interspeech; 2013. p. 426–430.
  17. 17. Hodges-Simeon CR, Gaulin SJC, Puts DA. Different Vocal Parameters Predict Perceptions of Dominance and Attractiveness. Human Nature. 2010;21:406–427. Available from: pmid:21212816
  18. 18. McAleer P, Todorov A, Belin P. How do you say hello? Personality impressions from brief novel voices. PLoS ONE. 2014;9(3):e90779. pmid:24622283
  19. 19. Pierrehumbert JB, Bent T, Munson B, Bradlow AR, Bailey JM. The influence of sexual orientation on vowel production (L). The Journal of the Acoustical Society of America. 2004;116(4):1905–1908. pmid:15532622
  20. 20. Smyth R, Jacobs G, Rogers H. Male voices and perceived sexual orientation: An experimental and theoretical approach. Language in Society. 2003;32(03):329–350.
  21. 21. Linville SE. Acoustic correlates of perceived versus actual sexual orientation in men’s speech. Folia Phoniatrica et Logopaedica. 1998;50(1):35–48. pmid:9509737
  22. 22. Levon E. Sexuality in context: Variation and the sociolinguistic perception of identity. Language in Society. 2007;36(04):533–554.
  23. 23. Bucholtz M. The whiteness of nerds: Superstandard English and racial markedness. Journal of linguistic anthropology. 2001;11(1):84–100.
  24. 24. Benor S. Talmid chachams and tsedeykeses: Language, learnedness, and masculinity among Orthodox Jews. Jewish social studies. 2005;11(1):147–170.
  25. 25. Podesva RJ, Roberts SJ, Campbell-Kibler K. Sharing resources and indexing meanings in the production of gay styles. Language and sexuality: Contesting meaning in theory and practice. 2002;p. 175–189.
  26. 26. Podesva RJ, Reynolds J, Callier P, Baptiste J. Constraints on the social meaning of released/t: A production and perception study of US politicians. Language Variation and Change. 2015;27(01):59–87.
  27. 27. Clopper CG, Pisoni DB. Homebodies and army brats: Some effects of early linguistic experience and residential history on dialect categorization. Language Variation and Change. 2004;16(01):31–48. pmid:21533011
  28. 28. Schroeder J, Epley N. The Sound of Intellect Speech Reveals a Thoughtful Mind, Increasing a Job Candidate’s Appeal. Psychological Science. 2015;26(6):877–891. pmid:25926479
  29. 29. Eckert P. Variation and the indexical field. Journal of Sociolinguistics. 2008;12(4):453–476.
  30. 30. Campbell-Kibler K. Sociolinguistics and perception. Language and Lingusitics Compass. 2010;4(6):377–389.
  31. 31. Puts DA, Gaulin SJC, Verdolini K. Dominance and the evolution of sexual dimorphism in human voice pitch. Evolution and Humun Behavior. 2006;27:283–296.
  32. 32. Little AC, Burrissa RP, Jones BC, Roberts C. Facial appearance affects voting decisions. Evolution and Human Behavior. 2007;28:18–27.
  33. 33. Todorov A, Mandisodza AN, Goren A, Hall CC. Inferences of competence from faces predict election outcomes. Science. 2005;308(5728):1623–1626. pmid:15947187
  34. 34. Collins S, Missing C. Vocal and visual attractiveness are related in women. Animal Behavior. 2003;6:997–1004.
  35. 35. Saxton T, Caryl P, Roberts SC. Vocal and facial attractiveness judgments of children, adolescents and adults: the ontogeny of mate choice. Ethology. 2006;112:1179–1185.
  36. 36. Riding D, Lonsdale D, Brown B. The effects of average fundamental frequency and variance of fundamental frequency on male vocal attractiveness to women. Journal of Nonverbal Behavior. 2006;30:55–61.
  37. 37. Chaiken S. Communicator physical attractiveness and persuasion. Journal of Personality and Social Psychology. 1979;37:1387–1397.
  38. 38. Dion K, Berscheid E, Walster E. What is beautiful is good. Journal of Personality and Social Psychology. 1972;24:285–290. pmid:4655540
  39. 39. Willis J, Todorov A. First Impressions: Making up Your Mind after 100ms Exposure to a Face. Psychological Science. 2006;17(7):592–598. pmid:16866745
  40. 40. Ballew CC, Todorov A. Predicting political elections from rapid and unreflective face judgments. Proceedings of the National Academy of Sciences. 2007;104(46):17948–17953.
  41. 41. Ross W, LaCroix J. Multiple meanings of trust in negotiation theory and research: A literature review and integrative model. International Journal of Conflict Management. 1996;7:314–360.
  42. 42. Lewis JD, Weigert A. Trust as a Social Reality. Social Forces. 1985;63(4):967–985.
  43. 43. Scherer KR, London H, Wolf JJ. The voice of confidence: Paralinguistic cues and audience evaluation. Journal of Research in Personality. 1973;7(1):31–44.
  44. 44. Bertrand M, Duflo E, Mullainathan S. How Much Should We Trust Differences-in-Differences Estimates? Quarterly Journal of Economics. 2004;119(1):249–275.
  45. 45. Angrist JD, Pischke JS. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press; 2008. Available from:
  46. 46. Tversky A, Kahneman D. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty. 1992;5(4):297–323.
  47. 47. Babel M, McGuire G, King J. Towards a more nuanced view of vocal attractiveness. PLOSONE. 2014;9(2):e88616.
  48. 48. Chen DL, Moskowitz TJ, Shue K. Decision-Making Under the Gambler’s Fallacy: Evidence from Asylum Judges, Loan Officers, and Baseball Umpires; 2016. Available from:
  49. 49. Chen DL, Spamann H. This Morning’s Breakfast, Last Night’s Game: Detecting Extraneous Factors in Judging. ETH Zurich; 2014.
  50. 50. Peresie JL. Female Judges Matter: Gender and Collegial Decisionmaking in the Federal Appellate Courts. The Yale Law Journal. 2005;114(7):1759–1790. Available from:
  51. 51. Sunstein CR, Schkade D, Ellman LM, Sawicki A. Are Judges Political? An Empirical Analysis of the Federal Judiciary. Brookings Institution Press; 2006. Available from:
  52. 52. Berdejó C, Chen DL. Priming Ideology: Electoral Cycles Among Unelected Judges. University of Chicago, Mimeo; 2010. Available from:
  53. 53. Puts DA, Apicella CL, Cárdenas RA. Masculine voices signal men’s threat potential in forager and industrial societies. Proceedings of the Royal Society of London B: Biological Sciences. 2011;Available from:
  54. 54. O’Barr W, Atkins B. “Women’s Language” or “powerless language”? In: Ginet SM, Borker R, Furman N, editors. Women and languages in Literature and Society. New York: Praeger; 1980. p. 93–110.
  55. 55. O’Barr WM. Linguistic evidence: Language, power, and strategy in the courtroom. Studies on Law and Social Control. New York: Academic Press; 1982.
  56. 56. Kiesling S. Men, Masculinities, and Language. Language and Linguistic Compass. 2007;1(7):653–673.
  57. 57. Eckert P, McConnell-Ginet S. Language and gender. Cambridge, UK: Cambridge University Press; 2003.
  58. 58. Butler J. Gender trouble: Feminism and the subversion of identity. New York, NY: Routledge; 1990.
  59. 59. Kessler S, McKenna W. Gender: An ethnomethodological approach. John Wiley and Sons.; 1978.
  60. 60. Locke JL. Duels and Duets: Why men and women talk so differently. New York: Cambridge University Press; 2011.
  61. 61. Kornhauser LA. Judicial Organization and Administration. In: Sanchirico CW, editor. Encyclopedia of Law and Economics. vol. 5; 1999. p. 27–44. Available from:,d.Yms.
  62. 62. Cameron CM. New Avenues for Modeling Judicial Politics. In: Conference on the Political Economy of Public Law. Rochester, NY: W. Allen Wallis Institute of Political Economy, University of Rochester; 1993. Available from:
  63. 63. Posner RA. An Economic Approach to Legal Procedure and Judicial Administration. Journal of Legal Studies. 1973;2(2):399–458. Available from: