Differences in words used to describe racial and gender groups in Medical Student Performance Evaluations

Purpose The transition from medical school to residency is a critical step in the careers of physicians. Because of the standardized application process–wherein schools submit summative Medical Student Performance Evaluations (MSPE’s)–it also represents a unique opportunity to assess the possible prevalence of racial and gender disparities, as shown elsewhere in medicine. Method The authors conducted textual analysis of MSPE’s from 6,000 US students applying to 16 residency programs at a single institution in 2014–15. They used custom software to extract demographic data and keyword frequency from each MSPE. The main outcome measure was the proportion of applicants described using 24 pre-determined words from four thematic categories (“standout traits”, “ability”, “grindstone habits”, and “compassion”). Results The data showed significant differences based on race and gender. White applicants were more likely to be described using “standout” or “ability” keywords (including “exceptional”, “best”, and “outstanding”) while Black applicants were more likely to be described as “competent”. These differences remained significant after controlling for United States Medical Licensing Examination Step 1 scores. Female applicants were more frequently described as “caring”, “compassionate”, and “empathic” or “empathetic”. Women were also more frequently described as “bright” and “organized”. Conclusions While the MSPE is intended to reflect an objective, summative assessment of students’ qualifications, these data demonstrate for the first time systematic differences in how candidates are described based on racial/ethnic and gender group membership. Recognizing possible implicit biases and their potential impact is important for faculty who strive to create a more egalitarian medical community.


Introduction
Ethnic/racial and gender disparities are prevalent in healthcare, including medical education. Historically under-represented minority (hURM) students have been reported to receive lower clerkship grades than White counterparts in medical school [1] and are less likely to be inducted into Alpha Omega Alpha (the premier medical school honor society) [2], and women and hURM's in academic medicine report a greater experience of discrimination [3][4][5]. Women and individuals from hURM backgrounds appear to be promoted less frequently at medical schools to the ranks of Associate and Full Professor [6,7]. Numerous studies have shown that women are described differently than male counterparts in letters of recommendation [8][9][10][11][12].
While there may be many factors contributing to such differences, some variation in how groups advance and in their subjective experience is thought to reflect discrimination by evaluators [13], including unconscious gender-linked [14] and ethnic/racial stereotypes [15]. One way to explore how biases might affect medicine is to look at the transition from medical school into residency training because of the highly standardized process through which all medical school graduates must apply.
The core document in each application is the Medical Student Performance Evaluation (MSPE; colloquially referred to as the "Dean's Letter"). The relationship between ethnicity/ race and gender and the way in which applicants are described in this document has only been minimally explored. The largest study included 297 applications (227 men, and 70 women) and looked only at differences in gender [16]. To the best of our knowledge, there are no studies examining the differences based on race/ethnicity in the MSPE.
Here we report data regarding the use of language in the MSPE's from 6,000 applicants to residency programs at our host institution in the 2014-2015 application cycle. We hypothesized that female and hURM applicants would be described differently from male and White applicants, respectively, according to well-established social stereotypes.

Study design and population
This was a retrospective cohort study of 6,000 MSPE's from graduates of 134 US allopathic medical schools submitted to sixteen residency programs at the Yale School of Medicine during the 2014-2015 application cycle. We did not include applications from international or Caribbean medical schools because the hypotheses we sought to test regarding possible implicit biases are socially mediated and may vary considerably based on local culture. We did not include applications from osteopathic medical schools due to differences in the structure of their curriculum, including that students are not required to take the United States Medical Licensing Examinations (USMLE; see discussion in Methods, below). The Yale University Institutional Review Board deemed this study exempt from review (exemption category 4; HIC# 1408014491).

Study protocol
A literature review was conducted to identify words and themes related to potential implicit biases within letters of recommendation for students or faculty in academic medicine. After extensive review, 24 words from four thematic categories (as previously described in the literature) were selected based on prior results and pertinence to this study [9,16,17]. The four categories included adjectives describing standout traits, ability, grindstone habits (i.e. work ethic), and compassion. We used Matlab (Mathworks) to extract from the MyERAS section of the application each individual's self-described race and ethnicity, gender, age, and medical school. We then used Matlab to extract from each MSPE the number of occurrences of the individual words within our defined thematic categories. Some adjectives are used by particular schools as an indicator of class rank; in these cases, the MSPE often contains common-text explaining the school's system. For example, one university writes in the first page of every MSPE: "We provide a grouping of 'outstanding,' 'excellent,' 'very good,' or 'good,' which is determined by the student's performance. . ." Because of this common text, each student at that school will have each of these adjectives appear at least once in their MSPE in a manner that does not relate to that student's personal attributes or performance. To account for this effect, for each school the software counted the minimum number of times that each word of interest was used across all applicants and then subtracted those values from each applicant's word count. For example, if all applicants from a school had the word "excellent" appear at least 5 times, the software subtracted 5 from each applicant's count of this word.
Some applicants applied to multiple programs (N = 358)-their MSPE was included only once. Due to technical difficulties obtaining applications, data concerning race and ethnicity were not available for applicants applying to pathology, otolaryngology, urology, and plastic surgery. Consequently, MSPE's from these fields were not included in the analysis on race and ethnicity.

Statistical analyses
Outcome data from the MSPE analyses were extracted using in-house software and then analyzed using STATA Version 14 (StataCorp. 2015).
Demographic characteristics of applicants were summarized. Chi-squared tests were then performed to analyze for each specified keyword the percent of applicants described at least once using that word by gender and race/ethnicity. A Bonferroni correction was applied to account for multiple comparisons (corrected alpha = 0.002).
A multivariable logistic regression was then performed to model the effect of race/ethnicity on the likelihood of an applicant being described by a particular word in the MSPE after controlling for USMLE Step 1 scores (in order to account for potential group differences that may correlate with test scores).

Results
The sample of applicants reviewed in the present study was comparable to national data on medical school seniors with respect to applicant gender, race, ethnicity, and age (Table 1) [18,19]. Applicants analyzed in the study were more likely to be from medical schools ranked in the top 40 by NIH funding and more likely to be inducted into the Alpha Omega Alpha (AOA) honor medical society.
The median age of applicants was 26, and 45% of applicants were women. The dataset was 7% Black, 4% Hispanic, 24% Asian, and 55% White. Thirty-six percent of applicants attended a school ranked in the top 40 by NIH funding and the median USMLE Step 1 score was 237. Demographic characteristics of the data set are illustrated in Table 2. Specialty breakdown of applications is shown in Table 3. Table 4 shows the percent of applications in which the descriptive word was used at least once in each MSPE by race and ethnicity. A significant difference was found in the use of the standout words "exceptional", "best", and, "outstanding" with White applicants being more likely than Blacks, Hispanics, and Asians to be described with these adjectives. Concerning the thematic category of ability, Whites were also statistically more likely to be described as "bright" when compared to Blacks, Hispanics, and Asians. Multivariate regression showed that while USMLE Step 1 scores showed a small but significant correlation with standout keywords, race/ethnicity group differences in the use of these adjectives remained significant after controlling for these scores (see Table 5; all χ 2 (5, N = 5,014) > 15, p< .01).
The adjective "competent" was used more frequently to describe Blacks than any other racial or ethnic group. Use of the word "competent" did not correlate with USMLE Step 1 scores. Based on this finding, a specific contextual analysis for the adjective "competent" was performed. Excerpts of the two sentences before and after each occurrence of the word "competent" were extracted from individual MSPE's and analyzed by three physicians. These physicians were chosen based on their experience in compiling and reading MSPE's and were blind to the race and ethnicity of the applicants in the MSPE's being reviewed. A total of 50 excerpts were reviewed with an oversampling of Black applicants (40% Black, 20% Hispanic, 20% White, 20% Asian). Each reviewer judged whether the use of "competent" had a positive, neutral, or negative connotation. Contextual analysis by reviewers demonstrated that the adjective "competent" had a positive connotation 37% of the time when describing Blacks compared to 33% for Hispanics, 57% for Whites, and 60% for Asians, p = 0.052.
There was a difference at the threshold of significance for use of the word "organized" with White applicants being more likely to be described as "organized". No statistically significant difference was found in the use of other grindstone words or for words in the category of compassion by ethnicity/race. Table 6 shows the percent of applications in which each word from the four thematic categories was used at least once in the MSPE by gender. Women were more likely than men to be described as "caring", "compassionate", and "empathetic" or having "empathy". Additionally, women were more likely to be described as "bright" and "organized". No statistically significant difference was found in the use of standout or grindstone adjectives.

Discussion
This study addresses a gap in knowledge concerning how descriptive language used by letter writers in the MSPE differs by gender and race/ethnicity of applicants to residency programs. Previous studies have assessed the impact of gender on letters of recommendation to residency programs; however, most of these studies were limited in sample size. Moreover, none of the previous studies addressed the relation between race/ethnicity on how applicants are described in the MSPE.
Although the American Association of Medical Colleges (AAMC) has established guidelines to standardize the MSPE, significant heterogeneity in the document persists across schools. Some MSPE's include long and detailed quotes from each clinical rotation while others are terse. Some MSPE's include extended summative narratives that are written by the Dean (or a representative) while others employ keywords that indicate a summative ranking.
Our data demonstrate that descriptive language used in the MSPE varies by group. In our sample, White applicants were more likely to be described with standout words such as "outstanding", "exceptional", and "best" when compared to Blacks, Asians, and Hispanics. Moreover, women were more likely than men to be described with words related to compassion such as "kind", "caring", and "empathic". Interestingly, the ability adjective "competent" was the only descriptor used more frequently for Blacks than any other race or ethnic group. Our additional contextual analysis of the use of "competent" showed that the adjective was less likely to have a positive connotation when describing Blacks and Hispanics. This suggests that the term may be used as a word of minimal assurance when describing Blacks and Hispanics. A critical question is the extent to which implicit bias by the authors of the MSPE may have contributed to the differences found in our study. Differences in the use of standout and ability words among applicants by race/ethnicity remained significant even after controlling for USMLE Step 1 scores. In an ideal world, one might hope to identify other standardized measures of performance in medical school that could be used as independent variables to study the possible role of implicit biases in the MSPE. The most obvious of these would be clerkship performance. However, for several reasons, this is not currently feasible (including different grade distributions and variable criteria for how grades are determined among medical schools). Analysis of such data would enable exploration of whether there may be group differences in clerkship performance based on race/ethnicity and gender, whether implicit beliefs of supervisors and peers contribute to group differences in performance where there are any, and/or whether implicit beliefs play a role in the subjective components of clinical evaluation even in the absence of differences in performance.
Unconscious bias concerning gender, race, and ethnicity has been well documented across a range of social settings (e.g. job or housing applications), even when other demographic variables and qualifications have been held constant [20]. Our study raises the question of whether implicit bias or even explicit stereotypes may also contribute to how medical students are perceived and described in the MSPE. Recognition of implicit bias and its impact on letters of evaluation is important for medical school Deans when writing MSPE's and for applicant reviewers as we strive to create a more egalitarian medical community. Of note, this issue is especially timely as the AAMC recently released a document outlining "Recommendations for Revising the Medical Student Performance Evaluation (MSPE)" [21]. According to this document: "The purpose of the MSPE is not to advocate for the student, but rather to provide an honest and objective summary of the student's personal attributes, experiences, and academic accomplishments based, to the greatest degree possible, on verifiable information and summative evaluations. When possible, comparative assessments of the student's attributes, experiences, and accomplishments relative to their institutional peers should be provided. . .". The task force creating this document also outlines stated principles (among others) of: enhancing "standardization and transparency that facilitates the residency selection process"; "[increasing] opportunity for program directors to examine applicants holistically in the pre-interview stage"; and including "qualitative and quantitative assessments of applicants in an easy to read format". The present data emphasize the challenge that Deans face in drafting these documents: in trying to incorporate holistic, narrative descriptions of applicants it is possible that implicit biases may undermine their objectivity.
There are limitations to this study that should be noted. Although our large sample came from 134 medical schools (95% of all US schools), including applications to 16 different residency specialties and reflecting more than 30% of the total number of US applicants, they were all submitted to a single institution (Yale University) during the 2014-2015 academic year. This sample contains an over-representation of schools from the Northeast (39% of our applications; 19% of applications were from the Midwest; 32% from the South; and 10% from the West). We also chose to analyze applications only from allopathic medical schools in the United States. Our conclusions should be considered within the scope of this particular population. Data on race and ethnicity were not available for students applying to otolaryngology, pathology, plastic surgery, and urology.
We would also note that the words selected for the present study were chosen based on a careful review of the extant literature-it is obviously possible that groups may differ based on other words (or bigrams or other combinations). Such possibilities might be explored using a computational linguistics approach that would not rely on a priori hypotheses. Nonetheless, the findings from the present study are clearly both significant and important-e.g. there can be no doubt that a Program Director will respond favorably to an applicant being described in the MSPE as the "best".
The results of our study have implications for future research. First, the current findings show differences in key word use in what is supposed to be the most objective aspect of the application. Additional studies should continue to explore the relationship between how applicants are described in the MSPE and other measures of performance. Relatedly, it will be important to evaluate the effect that medical school characteristics, such as size, geography, faculty diversity, student diversity, and racial climate have on the manner in which students are discussed in the MSPE. Finally, future work should explore how various factors impact residency match outcomes-including number of interviews offered and how programs ultimately rank applicants. In addition to MSPE keywords, this could include language usage in letters of recommendation and other factors, such as applicant photos-all of which have the potential to unfairly influence the selection of applicants into the limited opportunities for graduate medical training.