Differences in STEM doctoral publication by ethnicity, gender and academic field at a large public research university

Two independent surveys of PhD students in STEM fields at the University of California, Berkeley, indicate that underrepresented minorities (URMs) publish at significantly lower rates than non-URM males, placing the former at a significant disadvantage as they compete for postdoctoral and faculty positions. Differences as a function of gender reveal a similar, though less consistent, pattern. A conspicuous exception is Berkeley’s College of Chemistry, where publication rates are tightly clustered as a function of ethnicity and gender, and where PhD students experience a highly structured program that includes early and systematic involvement in research, as well as clear expectations for publishing. Social science research supports the hypothesis that this more structured environment hastens the successful induction of diverse groups into the high-performance STEM academic track.


Introduction
Increasing the diversity of the professoriate in STEM fields is a national priority [1]. Doctoral education is the principal gateway to the professoriate, motivating research on how graduate education contributes to (or inhibits) STEM faculty diversity [2,3]. Entry into the professoriate involves a highly competitive selection process, wherein only a small percentage of candidates are ultimately offered academic positions at universities.
In this process, a candidate's publication record is key. Although academic institutions differ in the specific weight they give to the various spheres of professional accomplishment, the publication record serves as the gold standard against which academic potential within the professoriate is judged. Publication is the currency that determines not only offers of employment, but also tenure and promotion decisions. This is particularly true at top-tier research a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 universities, which produce a disproportionate number of future faculty, and hence disproportionately influence whether the STEM faculty workforce will become more diverse nationwide [4]. To more fully understand disparities in hiring at the level of the professoriate, it is thus important to assess whether disparities exist in the rates of publication in peer-reviewed outlets among graduate students in STEM fields. The question of disparities in publication at the graduate level is especially important given recent evidence that across STEM fields, there has been a marked shift towards first publications occurring in graduate school as opposed to post-Ph.D. [5] Current scholarship on diversity at the graduate level has focused on the attitudes and biases of science faculty, which can affect student and job candidate evaluations [6,7], and on the unique financial, mentoring, and advocacy needs of students from underrepresented groups given systematic barriers to their success [8,9]. However, surprisingly little attention appears to have been devoted to the question of disparate doctoral publication rates, which might shed light on important disparities in the competitiveness of newly minted PhDs for the academic job market.
Here we address this key point of leverage by reporting on research that compares publication rates according to ethnicity, gender, and department within STEM doctoral programs at UC Berkeley. As a large public university that has granted more STEM PhDs over the past 10 years than any other US university [10], UC Berkeley is an ideal setting to compare differences in publication opportunities among students as a function of underrepresented minority (URM) status and gender. We further note that 8 of the 10 largest STEM PhD producers in the US are also large public universities. To this end, we draw from two extensive datasets on STEM graduate students at UC Berkeley, described below in two studies. Participants provided written consent to participate in Study 1; the project was reviewed and approved by UC Berkeley's Committee for the Protection of Human Subjects (CPHS) under protocol 2012-05-4347. Study 2 data was provided by UC Berkeley's Graduate Division according to CPHS guidelines and thus no individual consent was sought; the project was reviewed and approved by UC Berkeley's CPHS under protocol 2016-10-9231.

Study 1 Materials and methods
The first dataset, the Berkeley Life in Science Survey (BLISS), was conducted in 2013-2014 and examines potential differences in publishing activity among students as a function of URM status and gender. BLISS was conducted as a baseline study prior to implementation of interventions intended to increase the success of diverse students in the mathematical, physical and computer sciences.
The Berkeley Life in the Sciences Study (BLISS) formed an initial step in establishing the Berkeley Science Network, funded by the Kapor Center for Social Impact (a nonprofit organization), and the Berkeley Science Connections Program, funded by the US National Science Foundation. These programs are designed to strengthen the pipeline of underrepresented students in the mathematical, physical, and computer sciences (MPCS) by strengthening connections among undergraduate, graduate, post-doctoral, and faculty scholars in these disciplines. We surveyed graduate students in the Division of Mathematical and Physical Sciences (mathematics, statistics, physics, astronomy, earth and planetary science), the department of electrical engineering and computer science (EECS), and the College of Chemistry (chemistry and chemical engineering). As a result, BLISS included MPCS fields and did not include other STEM disciplines such as biological sciences and other engineering fields. BLISS sampled from the population of students at UC Berkeley enrolled in doctoral programs in the sciences, randomly sampling within ethnic and gender groups, while oversampling based on minority ethnic status and female gender.
Students eligible for inclusion as participants were identified by the university registrar. Survey completion was voluntary and garnered high participation rates. Table 1 lists participation rates for the BLISS survey. The use of unique participant identifiers enabled the researchers to link automatically to the students' records for demographic and enrollment and educational progress data (e.g., gender, ethnicity, GPA, educational status, years of graduate student teaching and research employment).
The survey itself contained a battery of questions designed to address various aspects of graduate student life in the sciences at Berkeley. Although the survey was not originally designed to address publication disparities per se, we did include a question in which the students indicated whether they had submitted a manuscript for publication in the past year. Focusing on manuscript submission rather than submission outcome (i.e., rejection, or acceptance) addresses possible disparities in student engagement in the process of publishing independently of the external peer review process (i.e., whether the paper was accepted or rejected). The question has binary response options (yes/no), and was thus treated as a Bernoulli process. We benchmark the comparisons of URM and female graduate students against male non-URM students, which includes White as well as Asian background males and for whom no statistically significant differences emerged in the two datasets reported here. Table 2 provides headcounts, observed responses, percent of affirmative responses, standard errors, confidence intervals, and p-values (compared to non-URM men) for self-reported submission of a paper for publication in the BLISS survey. The aggregate of these responses for each question is described with binomial statistics. The probability of success, p, is the percentage of yes responses to a given question. Standard errors are calculated with pooled binomial standard errors [11]. Confidence intervals are calculated using the Clopper-Pearson interval [12]. P-values are calculated using a two-tailed exact binomial test [13]. The gender disparity is about half of this. We then asked whether this disparity was evident across the departments surveyed. As seen in the middle panel of the figure, in the combined sample of MPS and EECS departments (aggregated to protect participant privacy given small sub-group samples), both female and URM students reported lower rates of having submitted a publication in the past year, compared to male, non-URM students (this is also true for MPS and EECS individually). As the bottom panel of Fig 1 illustrates, however, in the College of Chemistry there were no significant differences between female or URM students and their male, non-URM counterparts. These results were completely unanticipated at the outset of the survey.

Results
We additionally used logistic regression to statistically capture whether the effect of URM status or gender differed in Chemistry versus other departments. Logistic regression is used in cases where the dependent variable (in this case, submission of a paper for publication) is a dichotomous variable; multiple logistic regression is used to understand the unique effect of a given set of measured or independent variables in predicting the dependent variable [14]. Table 3 shows the results from a multiple logistic regression estimating the likelihood (in log odds) that a graduate student submitted a manuscript for publication as a function of gender (0 = male, 1 = female), underrepresented minority (URM) status (0 = non-URM, 1 = URM), Chemistry affiliation (0 = not in Chemistry, 1 = in Chemistry), and the two-way interactions for Chemistry with gender and URM status. As the table shows, the results revealed significant effects of URM status (URM), gender (Gender), and Chemistry affiliation (Chem). These main effects are qualified by a significant interaction between URM status and chemistry, reflecting the fact that the negative effect of URM status on publication is offset by Chemistry affiliation. A similar, albeit non-significant, interaction pattern is observed for Chemistry affiliation and gender.
We sought to ensure that the observed findings were not due to underlying differences among the student populations being compared. In a second multiple logistic regression model, we therefore added to the above model four variables likely to affect a student's publication efforts. First, we controlled for the number of years the student had been in the program (one student who reported having been in the graduate program for 38 years was excluded from this analysis, though his/her inclusion does not affect the findings). We also controlled for time spent employed in research, teaching, and on fellowship. These last three variables were converted to a fraction of the time spent in the program. Table 4 provides the results of this analysis. The results show that being in Chemistry (Chem), URM status (URM), and their interaction remain robust predictors of whether a student submitted a paper for publication, even when controlling for years enrolled in graduate school (Time), research employment (RA), teaching assistantships (TA), and fellowship support (Fellowship). However, the effect of gender becomes attenuated in this analysis.
The possibility remains that the students who volunteered to participate in this survey were not representative of all PhD students at the University. Further, data collection was limited to a single year, raising a concern that the disparities observed may not reflect long-term trends. To address these limitations, we examined a second dataset at UC Berkeley, the Graduate Division Exit Survey, required of all students prior to being granted their PhDs. The survey has been administered since 1995 with a response rate of 98%. The survey is retrospective and covers aspects of doctoral student experience over the whole period of students' degree programs at Berkeley.

Study 2 Materials and methods
The PhD Exit Survey is administered at the time of degree completion. Doctoral candidates submit the survey at the time that they file their paperwork with the Graduate Division. Candidates are not required to submit the survey form, but it is on the checklist of paperwork to be completed at the time of filing. Items on the survey are grouped into sections covering financial support, quality of advising, relationship with dissertation chair, aspects of scholarly practice, and first placement. The data for this study were extracted from the database, limiting responses to students who identified their majors in the broad disciplinary areas of biological, physical, and social sciences, and engineering in the academic years spanning 1998-1999 to 2013-2014. The overall completion rate for the time period of this study was 98%. UC Berkeley's Graduate Division does not receive external funding for this survey. We include data from the most recent 15-year period (1998-2013), allowing us to examine potential publication disparities with larger sample sizes, and thus examine trends separately for EECS, Mathematics, and Physics, three of the largest departments in the university. Note that this survey includes the Biological Sciences. Table 5 provides summary statistics for completers of this survey.
The PhD exit survey contained two questions that are particularly relevant to our analysis. The first asked students, "Did you deliver any papers at national scholarly meetings?" Presenting at national meetings is an important precursor to publication and signals active engagement in the research enterprise [15]. A second question in this survey asked, "Were you encouraged by faculty in your department to publish?" As with BLISS, the responses have binary (yes/no) response options and were thus analyzed similarly. We discuss findings and analyses for each of these questions in turn. Table 6 presents headcounts, observed responses, percent of affirmative responses, standard errors, confidence intervals, and p-values (compared to non-URM men) for the question, "Did you deliver any papers at national scholarly meetings? " Fig 2 presents illustrates the data from Table 6 graphically. Table 7 shows the results from a multiple logistic regression modeling the likelihood (in log odds) of a graduate student delivering a paper at a national conference as a function of gender (Gender), underrepresented minority status (URM), Chemistry affiliation (Chem), and the two-way interactions for Chemistry with gender and URM status. As the table shows, the results revealed significant effects of URM status and Chemistry affiliation. These main effects are qualified by a significant interaction between URM status and chemistry, reflecting the fact that the negative effect of URM status on publication is offset by Chemistry affiliation. No main effects or interactions with gender were observed. Following our analytic strategy, we ran an additional multiple logistic regression that added time spent in the program (Time), research employment (RA), teaching assistantships (TA), and fellowships (Fellowship) as covariates, with the last three expressed as a fraction of time spent in the program. Table 8 provides details of this analysis, which shows that the critical URM status by Chemistry affiliation interaction remains robust in the presence of these covariates. Table 9 presents headcounts, observed responses, percent of affirmative responses, standard errors, confidence intervals, and p-values (compared to non-URM men) for the item, "Were you encouraged by faculty in your department to publish?" The data correspond to Fig 3. Table 10 shows the results from a multiple logistic regression modeling the likelihood (in log odds) of a graduate student being encouraged to publish as a function of gender (Gender), URM status (URM), Chemistry affiliation (Chem), and the two-way interactions for Chemistry with gender and URM status. As the table shows, the results revealed significant main effects of URM status (negative) and Chemistry affiliation (positive). The interaction terms involving Chemistry affiliation and identity (URM status and gender) were not significant in this model, reflecting reduced variability in encouragement to publish relative to presentation at national meetings (see Fig 3). Nonetheless, we note a pattern of tighter clustering in Chemistry in comparison to other departments. Finally, we also ran a multiple logistic regression predicting encouragement to publish that included all of the above variables but also added time spent in the program (Time), research employment (RA), teaching assistantships (TA), and fellowships (Fellowship) as covariates, with the last three expressed as a fraction of time spent in the program. Table 11 provides details of this analysis, which shows the main effects of interest and interactions unchanged.

Results
The exit survey data replicate the finding that URM students in particular are underencouraged to publish and are provided fewer opportunities to present their research than their non-URM male counterparts. We also observe a more modest and less consistent gender disparity, as in the BLISS survey. The data also reveal that the College of Chemistry is more successful than the other departments in the STEM fields at Berkeley in mitigating disparities in presentation opportunities between URM and majority male PhD students.

Discussion
Does Chemistry have a different approach to graduate education, or specifically to helping students work toward publication of their research, than other disciplines at the university? We have begun to explore this question by considering the formal requirements and conventional practices of the graduate programs in the departments included in this study.
In Chemistry, we find that students experience a highly-structured environment in which they are introduced to research (via lab rotations) at the outset of their studies, their advisors are regularly and systematically queried as to their students' progress, and expectations surrounding publication of research results are both implicitly and explicitly clear even in the first two years of study. We also note that Berkeley's chemistry department has been particularly successful in placing their women PhDs in prestigious academic positions, relative to their peers [16]. By contrast, in Berkeley's departments of mathematics and physics, students report a relatively weakly structured environment compared with that in chemistry. However, further research is needed to better understand how these factors play out among other STEM departments.  Ethnic and gender differences in STEM publication at a large research university Research in education and psychology suggests that a lack of structure and/or clear expectations will have a disproportionate effect upon students who come to graduate school less familiar with a high-performance, research-oriented academic environment-e.g., first generation college graduates, whose parents are neither professionals nor academics, and students from non-research-oriented colleges [17]. For these students, unstated assumptions regarding the norms for academic productivity (publishing, presentations at prestigious conferences, etc.) may not become apparent to them until late in their PhD studies. Also, to the degree that the process of publishing often calls for subjective evaluation of the quality of a student's work, subtle judgments on the part of advisors and co-authors may cumulatively lead to fewer opportunities for minority scholars to present nationally and publish their work [18]. Virtually all of the steps involved in publishing an academic paper provide opportunities for the expression of subtle or unconscious bias-from the evaluation of an idea, to the procedures required to test those ideas, to deciding when a set of results is ready for publication, to the manuscript writing itself.
Research on diversifying Chemistry [17] and STEM more generally [19,20] has focused largely on recruitment of women and URM students into graduate programs, as well as these students' progress towards (and completion of) the Ph.D. This research highlights several important best practices for recruitment and retention efforts, including a visible commitment from institutional administrators, targeted scholarships, strong mentoring, and systematic benchmarking of both student progress and institutional goals. It remains an open question whether the best practices for recruitment and retention of underrepresented students in STEM fields lead to equity in research productivity. To fully understand disparities in hiring at the level of the professoriate, it is necessary to move beyond comparisons of normative student outcomes (e.g., graduation rates), and to assess instead whether disparities exist in the rates of publication in peer-reviewed outlets as graduate students in STEM fields consider and enter the academic job market. We underscore publications as a key factor that needs to be taken into account for increasing diversity within the professoriate.
Future research will need to address whether the findings observed at Berkeley are representative of Chemistry departments more generally, or whether they represent a specific culture that has been nurtured at Berkeley but which is nevertheless potentially replicable. Nonetheless, these findings suggest that straightforward measures to provide PhD students in STEM with well-structured environments, which should in fact be beneficial to all students [21], may mitigate against confounding issues of under-preparation and bias that might otherwise impede efforts to diversify the professoriate, especially at research-oriented universities. Ethnic and gender differences in STEM publication at a large research university

Author Contributions
Conceptualization: RMD CP AF AE IY MAR.

Data curation: AF AE AS.
Formal analysis: AF AE.
Funding acquisition: MAR CP RMD.
Investigation: RMD CP AF AE IY MAR.