Multi-institutional study of GRE scores as predictors of STEM PhD degree completion: GRE gets a low mark

The process of selecting students likely to complete science, technology, engineering and mathematics (STEM) doctoral programs has not changed greatly over the last few decades and still relies heavily on Graduate Record Examination (GRE) scores in most U.S. universities. It has been long debated whether the GRE is an appropriate selection tool and whether overreliance on GRE scores may compromise admission of students historically underrepresented in STEM. Despite many concerns about the test, there are few studies examining the efficacy of the GRE in predicting PhD completion and even fewer examining this question in STEM fields. For the present study, we took advantage of a long-lived collaboration among institutions in the Northeast Alliance for Graduate Education and the Professoriate (NEAGEP) to gather comparable data on GRE scores and PhD completion for 1805 U.S./Permanent Resident STEM doctoral students in four state flagship institutions. We found that GRE Verbal (GRE V) and GRE Quantitative (GRE Q) scores were similar for women who completed STEM PhD degrees and those who left programs. Remarkably, GRE scores were significantly higher for men who left than counterparts who completed STEM PhD degrees. In fact, men in the lower quartiles of GRE V or Q scores finished degrees more often than those in the highest quartile. This pattern held for each of the four institutions in the study and for the cohort of male engineering students across institutions. GRE scores also failed to predict time to degree or to identify students who would leave during the first year of their programs. Our results suggests that GRE scores are not an effective tool for identifying students who will be successful in completing STEM doctoral programs. Considering the high cost of attrition from PhD programs and its impact on future leadership for the U.S. STEM workforce, we suggest that it is time to develop more effective and inclusive admissions strategies.

The process of selecting students likely to complete science, technology, engineering and mathematics (STEM) doctoral programs has not changed greatly over the last few decades and still relies heavily on Graduate Record Examination (GRE) scores in most U.S. universities. It has been long debated whether the GRE is an appropriate selection tool and whether overreliance on GRE scores may compromise admission of students historically underrepresented in STEM. Despite many concerns about the test, there are few studies examining the efficacy of the GRE in predicting PhD completion and even fewer examining this question in STEM fields. For the present study, we took advantage of a long-lived collaboration among institutions in the Northeast Alliance for Graduate Education and the Professoriate (NEAGEP) to gather comparable data on GRE scores and PhD completion for 1805 U.S./ Permanent Resident STEM doctoral students in four state flagship institutions. We found that GRE Verbal (GRE V) and GRE Quantitative (GRE Q) scores were similar for women who completed STEM PhD degrees and those who left programs. Remarkably, GRE scores were significantly higher for men who left than counterparts who completed STEM PhD degrees. In fact, men in the lower quartiles of GRE V or Q scores finished degrees more often than those in the highest quartile. This pattern held for each of the four institutions in the study and for the cohort of male engineering students across institutions. GRE scores also failed to predict time to degree or to identify students who would leave during the first year of their programs. Our results suggests that GRE scores are not an effective tool for identifying students who will be successful in completing STEM doctoral programs. Considering the high cost of attrition from PhD programs and its impact on future leadership for the U.S. STEM workforce, we suggest that it is time to develop more effective and inclusive admissions strategies. PLOS

Introduction
In the umbrella biomedical PhD program at the University of North Carolina Chapel Hill, GRE Verbal (GRE V) and Quantitative (GRE Q) scores were similar regardless of whether students completed degrees in less than 5 years, in 5-6 years, in greater than 6 years or if they withdrew [20]. Similarly, neither GRE V nor GRE Q scores were correlated with PhD degree attainment in the Vanderbilt University Medical School's biomedical umbrella program [21]. Overall, these findings suggest that GRE scores are not useful for identifying students who will complete PhD degrees in biomedical research programs.
In the present study, we sought to determine whether GRE scores are predictive of PhD completion in a broader range of STEM fields and whether there are gender differences in the predictive abilities of the GRE. To obtain sufficient data for meaningful comparisons, we collected information on over 1800 students from four variously sized state flagship research universities that participate in the Northeast Alliance for Graduate Education and the Professoriate. The Alliance is a long-standing collaboration originally funded by the National Science Foundation to diversify STEM PhD programs. We sought to avoid problems encountered in previous multi-institutional studies by including only data from doctoral students who had enrolled in STEM programs (listed below). In addition, we focused only on U.S. citizens or Permanent Residents to reduce confounding variables. An analysis of a somewhat larger dataset that included social science students from the NEAGEP cohort was made available previously [22].

Materials and methods
This research was approved by the University of Massachusetts Amherst Institutional Review Board (Approval number 2018-4724). No informed consent request was required because data were analyzed anonymously. All identifiable data are stored on a password-protected computer in the possession of the PI.

Sample
Four state flagship research universities with graduate student enrollments ranging from approximately 1,500 to 14,000 provided data for all U.S. citizens and Permanent Residents entering PhD programs in STEM between 2000 and 2005. This date range ensured that students had started their programs at least 10 years before the time of data collection. The sample included 1805 students of whom 57.5% were men and 42.5% women.
The STEM fields included in this study were Biological Sciences, Physical Sciences, Chemical Sciences, Computer and Information Sciences, Engineering, Geosciences, Mathematical Sciences and related technology areas. The data collected included GRE V and GRE Q scores, major/department, year of entry to the program, whether degree was completed, year of program completion or withdrawal, gender, and race/ethnicity (the small sample size precluded use of race/ethnicity data in the present study). All GRE scores used in this study were from the pre-2011 version of the test and ranged from 200 to 800. Each institution entered its data on a standardized template. To ensure comparability, data from each institution were checked and some entries were removed if students were still enrolled after 10 years, lacked GRE scores or were not in the STEM fields listed above.

Analysis
We used a logistic regression approach to model degree completion as a function of institution, gender and GRE V or GRE Q scores, including all interactions in the analysis. When interactions were found to be insignificant in explaining variability in degree completion we did not consider them in subsequent analyses. Gender differences in GRE scores for each institution were also examined across the four institutions using two-way ANOVA.
We used two-way ANOVA to compare means of GRE V or GRE Q scores between men and women who completed or left STEM doctoral programs within the first year. To determine whether there were gender differences in the rate of leaving during the first year, we used contingency tables and Chi-Square tests.
For a more in-depth analysis, we divided subjects into quartiles based on rank-ordered GRE V or GRE Q scores and combined quartile data across institutions. We determined the mean time to degree and the completion rate for each gender at each quartile. We used twoway ANOVA to determine whether time to degree varied by quartile for either gender, and compared male and female completion rates using contingency tables and Chi-Square tests. We performed follow-up studies to test whether students with GRE Q scores in the lowest quartile had high GRE V scores that might confer an advantage in graduate school. Mean GRE V scores were calculated for each GRE Q quartile and compared using one-way ANOVA.
We then examined the relationship between GRE Q scores and PhD completion in engineering, a discipline generally considered to be mathematics-intensive. We first used two-way ANOVA with gender and institution as the main effects. Finding no significant institutional effect or interaction between institution and gender on GRE scores, we combined data for all institutions. We used two-way ANOVA to compare GRE scores between men and women and between students in engineering and non-engineering STEM doctoral programs. We also compared completion rates for males and females in engineering using Chi-Squared tests. We then used two-way ANOVA with gender and completion status as main effects to determine whether GRE scores differed significantly between students who completed or left programs. We further evaluated interaction effects using Sidak's multiple comparison test. We also divided men enrolled in engineering PhD programs into quartiles based on GRE Q scores. We then calculated the percent completion for each quartile and compared rates among quartiles using Chi-Squared tests.

Results
There were no significant differences in GRE V scores between men and women in any institution (data not shown), but there was a significant gender effect in GRE Q scores that was examined using Tukey's multiple comparison test post hoc. Men had significantly higher GRE Q scores than women in every institution (Fig 1).
Data in Table 1 show that there were no significant gender differences in completion rates, time to degree, time students stayed in programs before leaving, or the percentage who left during the first year. Completion rates for both men and women were similar to the national 10-year completion rate of 59.1% in STEM PhD programs overall [2]. In addition, the average time to degree for students in our study was less than 6 years, and the national 6-year STEM PhD completion rate is only 42.7% [2]. It is also notable that the institutions in our study are all public, but the completion rates were similar to those of private institutions that may have higher aggregate GRE scores [2]. Table 2 shows results of our analysis of GRE scores and PhD completion for women and men. Our logistic regression analysis found interactions between gender and institution effects were not significant for GRE V or Q scores, but gender and GRE Q interactions were significant.
We then used Women as the reference category and determined that there was no effect of GRE Q scores on PhD completion for women. However, the difference between genders was significant so we next used the category Men as the reference and found that GRE Q scores had a negative predictive effect on PhD completion. Our two-way ANOVA verified that GRE scores were not associated with PhD completion for women, and that men who completed STEM PhDs had significantly lower GRE Q scores than those who left their programs. We also found significantly lower GRE V scores for men who completed than for those who left programs ( Table 2).
As shown in Table 3, we found no differences in GRE V or GRE Q scores between students (women or men) who completed and those who left programs within the first year.
Data in Table 4 show that there were no significant differences in completion rates for women based on quartile for either GRE V or GRE Q scores. This is despite a difference of approximately 267 points between those in the highest and lowest quartiles of GRE V, and a difference of 225 points between highest and lowest quartiles of GRE Q scores. In contrast to women, men in the lowest quartile of GRE Q scores finished at a higher rate than counterparts in all three higher quartiles. It is notable that men in the lowest quartile for GRE Q scores averaged 196 points below those of men in the highest quartile, the group with the lowest completion rate. Similarly, men in the third quartile of GRE V scores finished at a significantly higher rate than those in the two higher quartiles.
The pattern of men in the lowest quartile for GRE Q finishing at a higher rate than those in the highest quartile was seen in each of the four institutions in the study (Fig 2).  We hypothesized that men in the lowest quartile of GRE Q scores might have high GRE V scores that would provide them some advantage in PhD completion, but found that this was not the case. Instead, men with the highest GRE Q quartile scores also had the highest GRE V scores for all institutions (not significantly different in Institution D, the smallest institution) (Fig 3).
We also found that there were no differences in time to degree for men or women based on either GRE V or GRE Q quartile rankings (Table 5).
Consistent with previous work [17], we found that over 70% of women and 75% of men enrolled in engineering doctoral programs in our sample had GRE Q scores of 700 or above. Table 6 shows that there were no significant gender differences in either GRE V or GRE Q scores for students enrolled in engineering PhD programs, but men in non-engineering STEM fields had significantly higher GRE Q scores than women counterparts. Finally, GRE V scores for men in engineering were significantly lower than for men in non-engineering STEM programs. Table 7 shows that for both men and women, GRE V and Q scores were similar for those who completed and those who left engineering PhD programs. GRE Q scores were also similar for women who completed engineering PhD degrees and those who left. In contrast, men who left engineering doctoral programs had significantly higher GRE Q scores than those who completed degrees. Completion rates for women did not differ from those for men (60.9% vs. 64.6%; Chi-Squared test).
We divided men into quartiles based on GRE Q scores to further investigate our finding that men who completed PhD degrees in engineering had lower GRE Q scores than men who left their programs. We also compared these men with men in non-engineering STEM programs. As shown in Fig 4, men in the lowest quartile of GRE Q scores in both engineering and non-engineering STEM doctoral programs finished at higher rates than those in the highest quartile. This is despite the fact that scores of those in the latter group averaged 140 points higher.

Discussion
Our findings provide strong evidence that GRE scores are not predictive of STEM doctoral degree completion for U.S. and Permanent Resident students. In addition, our data demonstrate the importance of considering women and men separately when studying the relationship between GRE scores and PhD completion. We found that GRE Q scores did not predict PhD completion for women in STEM programs and that, unexpectedly, GRE Q scores were higher for men who left than for those who completed PhDs. When we examined this finding more closely, we saw that men with GRE scores in the lowest quartile finished at higher rates than any other group, a pattern seen in each of the four institutions. This is particularly surprising because men in the lowest quartile had GRE Q percentile scores averaging  approximately 34, and those in the highest quartile had percentile scores averaging 91. It is also notable that GRE scores did not predict time to degree or foretell who would leave during or after the first year. Finally, in engineering, a field in which mean GRE Q scores of admitted students are higher than in other fields [17], men in the lowest quartile for GRE Q scores completed at a rate 25% higher than those in the highest quartile. Overall, our data suggest that if we consider program completion to be the most important index of success, the practice of relying heavily on GRE scores [7,17] for selecting STEM doctoral students needs to be reexamined. The Educational Testing Service, the organization that prepares and administers the GRE, advises against having "cut offs" for GRE scores [8], but there is evidence that the practice continues [7]. In our study, we found that in each of the four institutions, women who were enrolled in STEM PhD programs had GRE Q scores that averaged 40 points lower than men (but women completed at rates similar to those of men). These data might be used to support the idea that admissions committees were ignoring GRE Q scores and, therefore, the scores do not represent a source of bias. But, another interpretation is that GRE Q scores may have restricted the number of women admitted because there were fewer women in the pool who had "acceptable" scores as suggested previously [17]. This is especially concerning in fields wherein high GRE Q scores are formally or informally required and women are severely underrepresented [17].
Indeed, our data suggest that GRE Q scores likely had a limiting effect on participation of women in engineering just as they do in physical sciences [17]. GRE Q scores of men and women enrolled in the engineering programs in our sample did not differ significantly, and over 70% of all students enrolled scored at least 700. Data presented by Miller and Stassun [17] suggest that less than 40% of women, but nearly 65% of men who apply to engineering programs score at or above 700. Therefore, the pool of women with scores above 700 was significantly smaller than for men, a factor that may contribute to the finding that women made up less than a third of the engineering doctoral student group in our study. This is of concern because the percentage of U.S. women who earn engineering doctorates has been below 25% over the past 10 years [24]. If a goal of the country is to significantly increase the number of U. S. engineers and to achieve gender parity in the field, it seems reasonable to remove the GRE Q score as an obstacle.
It is particularly troubling that GRE Q scores appear to play such a large role in STEM doctoral admissions decisions because our data show that they do not predict PhD completion for women STEM students and for men they are negative predictors. In fact, our current findings suggest that it is not just women who may be excluded, but also talented men who score below 600 on the GRE Q. This group finished at rates far above other groups, suggesting that they have abilities not predicted by GRE scores but key to STEM PhD completion. It was beyond the scope of this project to probe differences that may explain our findings, but we ruled out the possibility that males with low GRE Q scores had high GRE V scores that might be an asset to them. It will now be important to determine what characteristics persuaded admissions committees to accept these men with GRE Q scores in the lowest quartile. We can then study whether these characteristics play a role in STEM PhD degree completion and could be used in admission assessments to identify untapped talent.
Our study is the first to show that GRE Q scores are negative predictors of degree completion for men in STEM, but others have reported similar findings in data not disaggregated by gender. In a large meta-analysis that included 1055 students in life sciences, researchers found a negative correlation between GRE Q scores and degree attainment in that discipline [18]. Others found that GRE Q scores for students who graduated in applied sciences or life sciences were approximately 30 points lower than for those who did not finish [25,26]. In a study of 340 doctoral students in a group of biology, chemistry and psychology departments, GRE Q scores of students who withdrew were 21 points higher than those who completed [10]. Unfortunately, it was not clear that the difference was statistically significant, and the sample included both men and women. In addition to these studies suggesting that GRE Q scores may be negative predictors of STEM PhD completion, others found that neither GRE Q or V scores of doctoral students differed between those who leave PhD programs and those who progress beyond the third year [19] or who complete programs [20,21]. It should be noted that one meta-analysis of graduate students not disaggregated by gender, degree type or discipline reported a weak positive correlation between GRE (total) scores and degree completion [27].
The Educational Testing Service publications suggest that GREs are best suited to predict first-year graduate GPAs [10]. This might be relevant to the selection process if GRE scores predict who will fail first-year courses and leave STEM PhD programs during or after the first year. On the contrary, we found that neither GRE V nor GRE Q scores of males or females differed between students who completed PhD degrees and those who left during the same calendar year that they entered. We also found no differences in time to degree based on GRE V or GRE Q quartile scores for either gender, consistent with previous findings of others [21]. Although we did not examine any other indices of success in STEM PhD programs related to GRE scores, Hall et al. [20] found that neither GRE V nor GRE Q scores predict the number of first author publications. Moneta-Koehler et al. [21] found that GRE V scores were moderate predictors of first semester grades, graduate GPAs and of better subjective faculty evaluations of some aspects of students' performance. However, these predictions did not translate to differences in time to degree, passing qualifying exams, numbers of conference presentations, or numbers of individual fellowships or grants [21].
The cost of an admission system that is not effective in identifying successful STEM doctoral students goes beyond limiting the number of potential contributors to the innovation economy; it has severe financial consequences to the institutions and the nation. In our cohort of 1805 students, 703 did not complete the doctoral degree and 102 left during the same year they enrolled. Of the 601 students who left after the first year, the average time to leaving was approximately 3 years for both men and women. The annual cost of training students in the four institutions in our study averaged $58,000 per student. Thus, the cost of attrition for those who left during the first year was $5.9 million. For those who stayed for 3 years, the cost approached $105 million. This means that the cost of attrition for the five-year cohort in our study averaged $22.2 million/year. The cost may be significantly higher because those who left after three years likely obtained masters' degrees. Although this may not be considered a true loss to the U.S. STEM workforce, it may be a revenue loss to institutions that charge tuition and fees to students seeking STEM masters' degrees, but waive these charges and provide stipends for students seeking PhD degrees. If we apply these calculations to the national cost of attrition for approximately 13,000 students from each entering cohort (assuming a 59% noncompletion rate and approximately 18,000 completing STEM PhD degrees [1]), the cost is between $1 billion and $3 billion per cohort. It should be noted that these calculations do not include the potential value of papers, patents and contributions to the teaching mission created by graduate students who did not finish. Also not considered in these calculations are the personal investments of students who do not complete STEM PhD programs and their families, or the time and resources faculty and staff invest in these students.
Over the years and through multiple iterations of the GRE, there have been strong databased appeals, many from faculty members, to stop using the test in the STEM admissions process [9, 16, 17, 19-21, 28, 29]. In addition, the National Science Foundation no longer requires students to report GRE scores in fellowship applications and the National Institutes of Health does not ask training grant recipients to report GRE scores of their trainees. Still, despite assertions to the contrary, admissions committees continue to rely to a great extent on the GRE [30], particularly on the GRE Q that is arguably the most biased portion of the exam [17]. In addition to erroneously viewing the GRE as predictive of PhD completion, faculty members have numerous, wide-ranging, and largely anecdotal reasons for the strong attachment to the GRE. One of the main problems may be that there are few exemplars of successful students with low scores if most of those chosen have high GRE scores. When a high scorer leaves, faculty accept that "it wasn't right for him/her", but if a low scorer leaves, faculty suggest that it was predictable based on GRE scores. In addition, most STEM faculty who are currently in academia necessarily did well on GRE exams or they would likely not have been admitted. Thus, they assume the test was predictive. Finally, a pragmatic reason for relying on GRE scores to identify students for admission is that it speeds up the process, particularly in programs with a large number of applicants.
In summary, this study provides convincing evidence that GRE scores are not predictive of STEM PhD completion for U.S./Permanent Resident students at state flagship research institutions. In addition, relying on the GRE Q is likely to exclude talented students with scores below an arbitrarily defined "acceptable" score, but who have other characteristics that are better predictors of success. Considering the high cost of attrition and its impact on future leadership for the U.S. STEM workforce, it seems prudent to reconsider the role of GRE scores in the STEM PhD selection process. If we can identify the characteristics that motivated admissions committees to overlook GRE scores of men in the lowest quartile, we can study whether these characteristics contributed to the high completion rates of this group of students. In doing so, perhaps we can develop more inclusive and predictive STEM doctoral admissions processes.
Supporting information S1 File. De-identified data set used for the analyses described herein.