The GRE over the entire range of scores lacks predictive ability for PhD outcomes in the biomedical sciences

The association between GRE scores and academic success in graduate programs is currently of national interest. GRE scores are often assumed to be predictive of student success in graduate school. However, we found no such association in admission data from Vanderbilt’s Initiative for Maximizing Student Diversity (IMSD), which recruited historically underrepresented students for graduate study in the biomedical sciences at Vanderbilt University spanning a wide range of GRE scores. This study avoids the typical biases of most GRE investigations of performance where primarily high-achievers on the GRE were admitted. GRE scores, while collected at admission, were not used or consulted for admission decisions and comprise the full range of percentiles, from 1% to 91%. We report on the 32 students recruited to the Vanderbilt IMSD from 2007–2011, of which 28 completed the PhD to date. While the data set is not large, the predictive trends between GRE and long-term graduate outcomes (publications, first author publications, time to degree, predoctoral fellowship awards, and faculty evaluations) are remarkably null and there is sufficient precision to rule out even mild relationships between GRE and these outcomes. Career outcomes are encouraging; many students are in postdocs, and the rest are in regular stage-appropriate career environments for such a cohort, including tenure track faculty, biotech and entrepreneurship careers.


Introduction
Recently Moneta-Kohler et al. [1] published a detailed statistical analysis of the lack of ability of the GRE to predict performance in graduate school in the biomedical research arena at Vanderbilt. A similar study was published by Hall et al. [2] from the University of North Carolina Chapel Hill. However, there was a limitation to the overall conclusions in that the range of GRE scores did not cover scores lower than approximately 50%. In order to test if such a limitation impacted the predictive ability of the GRE, we would need to admit students for whom we had GRE information, but where the admitted students covered the entire range of scores Materials and methods GRE (Quantitative and Verbal) scores and academic performance data from 28 IMSD students who matriculated from 2007 to 2011 were collected and examined. Academic performance outcomes of interest were: time elapsed in program (i.e., months to degree), number of publications, number of first-author publications, fellowship status (any or F31), Vanderbilt faculty ranking (10 = best, 50 = worst). Table 1 provides the list of attributes and competencies used in the faculty ranking, and for the remaining criteria, Table 2 provides univariate summaries (e.g., mean, median, standard deviation, inter-quartile range) of these variables. Fig 1 presents the GRE scores for 30 URM students admitted into the graduate program in the biomedical sciences at Vanderbilt from 2007-2011 who completed either a PhD or a MS degree. Fig 2  presents histograms of the continuous outcome variables. Regression modeling was used to assess the degree of association between GRE outcomes and academic outcomes. Specifically, Poisson regression was used to model publication counts (accounting for length of time in the program), months to degree, and faculty ranking. Logistic regression was used to model receipt of fellowship. For all models, we report point estimates, model robust standard errors, and 95% confidence intervals (CIs). We plot each performance measure as a function of GRE scores and include the fitted regression line as well as a locally weighted scatterplot smoother (lowess) line to visually assess linearity assumptions and model fit. Confidence intervals were plotted to demonstrate the degree of precision afforded by the data at the 95% level. Any relationship between GRE scores and outcomes would be captured in the slope of these regression lines. While it is not possible to prove the null hypothesis that GRE scores and outcomes are not related, it is possible to provide an upper bound on the largest potential association. The 95% CIs provide this boundary and comprise the set of associations supported by the data. As we will see from the data, despite the small sample size, these CIs do not support mild or strong associations between GRE scores and outcomes. For a sensitivity analysis, we compared academic outcomes between the first quartile and the fourth quartile of GRE scores. If any association were present, such an analysis should at least yield exaggerated point estimates of the association effect.

Results
In Fig 1 we report the range of GRE scores among the 30 URM students admitted into the graduate program in the biomedical sciences at Vanderbilt from 2007 through 2011 and who completed a PhD or a MS degree. The admission decisions for these students during this time period was determined by the IMSD admissions committee, and although the student's GRE score was recorded in our databases, it has only been used for outcomes studies long after the admissions event. The GRE-tolerant nature of our approach is validated by the range of GRE scores among this group of students. Scores varied across the spectrum for students who were admitted in response to a detailed analysis of the committee's assessment of the likelihood of the student's success in research. The committee's assessment was based primarily on the nonquantitative components of the application, including a close reading of the letters of recommendation and the student's personal statement. The student's transcript was evaluated, primarily to assess adequate coursework preparation for biomedical PhD coursework. A wide range of GPAs were accepted. We sought to place the overall and science GPAs in the context of the college or university and the life events of the applicant. For example, students with extensive work and/or family responsibilities might reasonably be expected to end up with lower GPAs due to time demands. The lowest GPA accepted among this group of students was 1.8. Finally, all students were invited to campus for an interview visit that was also given significant consideration. From the 30 students with GRE scores shown in Fig 1, 93% (28 students) have graduated with the PhD and 7% (2 students) left with an MS degree. Two students who have recently completed the PhD are currently looking for their next position. However, of the remaining26 students who completed the PhD, 85% (22 students) continued to postdoctoral positions. Four students did not continue on to postdocs, choosing instead to move to industry, consulting, medical school, or an academic faculty position. Overall, the outcomes of this cadre of GREblind admitted students are strikingly parallel to those of students admitted through the traditional route (using much higher GRE scores) over the same time period [4]. As indicated in Fig 1, we have a wide range of GRE scores among this group. This unusual group provided us with a means to test the predictive value of GRE scores over a much wider range than most admissions committees will typically tolerate.  months to degree, and faculty ranking. The faculty ranking is obtained upon the student's completion of their Ph.D. The ranking is comprised of the sum of scores for each of ten questions, listed in Table 1. The questions cover a range of areas that are often informally assessed as measures of developing into a successful, independent scientist; many would fall into the area of the social/emotional learning skillset. We ask the PhD faculty mentor to score their newly-minted PhD student from one to five, with one being best. Thus, the top ranking possible is a 10, if the student received a score of one for each of the ten questions. Student rankings ranged from 12 to 39 with a median of 21. The other metrics are self-explanatory, with number of publications ranging from one to fifteen (median = 5.5) and first author publications from one to six (median = 2). These metrics are actually very similar to those for the non-IMSD students who were admitted over the same 2007-11 time period by the IGP admissions committee using the traditional process including GRE scores. The 209 students in this traditional cohort had a median number of publications of 5 and median first author publications of 2 (see S6 Table and S1 Fig for details.) Note that students are expected to publish at least one first author paper as a requirement for the PhD in most of our biomedical sciences PhD granting programs. The time to degree for the 28 students admitted in a GRE-tolerant manner ranged from slightly more than 4 years, to just over 7 years (median = 5.7 years). In addition to the data shown in Fig 2, we also included whether or not the student obtained an individual fellowship in a national competition (F31, AHA, DOD, etc) as an additional metric. Summary statistics of the data for this study are presented in Table 2. The hypothesis we test is that GRE scores are associated with future performance in a biomedical graduate program. This association will be measured by the slope in a regression model, to be explained shortly.

Lack of association between GRE scores and publications
We modeled the relationship between total number of publications and GRE scores using  publication rate by just 2% (rate ratio = 1.028 with 95% CI 0.846 to 1.251) For instance, students with GRE-V scores of 40% and 60% are expected to have 5.61 and 5.768 publications, respectively (a meaningless difference). We do not judge either of these minor differences to be scientifically relevant. Similar minor differences were also observed when the total number of first author publications and GRE scores was modeled using Poisson regression in Fig 4. Increasing a student's GRE-V score by 20 percentage points increases their expected publication rate by 0.1% (rate ratio = 1.001 with 95% CI 0.99 to 1.012). For instance, students with GRE-V scores of 40% and 60% are expected to have 2.334 and 2.379 first author publications, respectively. Increasing a student's GRE-Q score by 20 percentage points increases their expected first author publication rate by 15% (rate ratio = 1.154 with 95% CI 0.954 to 1.397). For instance, students with GRE-Q scores of 40% and 60% are expected to have 2.236 and 2.581 first author publications, respectively, which is essentially no difference. We conclude that even when GRE scores below the 20th percentile are in the mix, productivity as measured by the key currency of the scientific enterprise, namely publications-exhibits, at most, very little dependence on GRE score and may well be unrelated in any meaningful sense.

Lack of association between GRE scores and time to degree
In Again, the solid curve shows the fitted values from the Poisson regression model (dashed lines are 95% confidence intervals) and the grey curve shows a lowess smoother (locally weighted scatterplot smoother). We observe only a very minor correlation between higher GRE scores and shorter time to degree. Increasing either the GRE-Q or GRE-V by 20 percentage points leads to a minor decrease in expected time to degree attainment of 1 month (rate ratio = 0.99 with 95% CI 0.0.997 to 1.002) and (rate ratio = 1 with 95% CI 0.997 to 1.002), respectively. This means that students with GRE-Q scores of 40% and 60% are expected to take 71 months and 70 months to complete their degree, respectively. Likewise, students with GRE-V scores of 40% and 60% are expected to take 71 months and 70 months to complete their degree, respectively.

Lack of association between GRE scores and fellowships
We are well aware that counting papers, either first author or total, has limitations-especially since neither metric captures the quality and/or impact of the publications. Such parameters are difficult to uniformly measure because they are often very field-specific, and sometimes the impact of research is not fully appreciated for years to come. Therefore, we sought to include individual fellowships obtained as one metric of student quality. We included fellowships that are reviewed nationally by panels of experts, providing a comparison between students in this cohort against students at similar stages of training from other institutions around the country. Predoctoral fellowships obtained by this cohort are included in Table 3.
Boxplots of GRE scores stratified by whether or not students received a fellowship are shown in Fig 6. From bottom to top, the horizontal lines of a boxplot show the min, 25th percentile, median, 75th percentile, and max values in a given group. In Fig 7 the predicted probability of obtaining a fellowship as a function of GRE score is presented. Interestingly, increasing a student's GRE-Q score by 20 percentage points decreases their odds of receiving a fellowship by 45% (odds ratio = 0.55 with 95% CI 0.217 to 1.392). For instance, the predicted

Lack of association between GRE scores and faculty evaluation
At the completion of their doctoral training, each faculty mentor is asked to evaluate their PhD student on each of ten questions provided in Table 1. The student is not aware that they are or have been evaluated, and the evaluation is never shared with the student nor used for any other purpose. It is important to note that a lower ranking indicates a better evaluation, with 10 being the highest score possible and 50 the lowest score. Fig 8 (left panel) shows the association between GRE-Q score and faculty ranking. As in the prior figures, solid curves show the fitted values from the Poisson regression models (dashed lines are 95% confidence intervals) and the grey lines show the lowess curves. Corresponding data for GRE-V score and faculty ranking are presented in Fig 8,  The data indicate that GRE scores across the entire range of values in this cohort are not predictive of the outcome measures we assessed. We took one final approach-testing for differences in performance measures between the lower and upper quartiles of the GRE scores. To be clear, we compared students with very low scores (<25% GRE-Q or V) to students with very high scores (>75% GRE-Q or V). Although this approach does not use all the data, it would be expected to yield an upwardly biased estimate of the GRE outcome association. The results of such an analysis are shown in Table 4 (for GRE-Q) and Table 5 (for GRE-V). For both tables the first two columns show the mean and standard deviation (SD) of performance measures (number of publications, number of first author publications, months to degree, and faculty ranking) among students in the lower 25th percentile and the upper 25th percentile of GRE score. The third and fourth columns show the difference in mean performance measures between the lower and upper quartiles and the 95% confidence intervals. We see that the point estimates are modest at best, and all confidence intervals include zero as expected. Therefore, even when comparing very low scores, (a range that many graduate schools rarely admit The first two columns show the mean and standard deviation (SD) of performance measures among students in the lower 25 th percentile and the upper 25 th percentile.
The third and fourth columns show the difference in mean performance measures between the lower and upper quartiles and the 95% confidence intervals.
https://doi.org/10.1371/journal.pone.0201634.t004 students) to high scores, we do not find evidence that a relationship exists even between the two most likely classes of students.

Outcomes of the cohort to date
For the 28 students in the cohort analyzed here, the final question we can ask is where are they now? As mentioned earlier, most of the cohort moved on to a postdoctoral position upon PhD completion at a range of research-intensive institutions listed in Table 6. The students in this cohort completed their PhDs between spring 2012 and fall 2018, so some have had time to move to a position beyond the first postdoc. So far after completing their first postdoctoral position, two individuals have moved on to Biopharma, one who is developing a start-up company, one moved to an administrative position at NIH, and one is now a tenure-track assistant professor. At of the time of this writing (November 2018), none of this cohort of 28 students have left science.

Discussion
As a result of the admissions process adopted by the Vanderbilt IMSD program over a decade ago, we now have a cohort of graduate students whose GRE scores spanned the entire range from 1-91 percentile who have completed the PhD. This analysis includes 28 IMSD students who matriculated into our biomedical research programs from 2007-2011 and completed PhDs beginning in 2012 to fall 2018. In this study, we consistently observed only associations between academic outcomes and GRE scores. Even when accounting for the variability in these estimates (i.e., the width of the 95% CI) we see that the data support at most very minor associations, if any. This is clearly represented by looking at the confidence bands for the regression lines. For example, when modeling the number of first author publications as a function of quantitative GRE score, we found the rate ratio (slope) was 1.154 (95% CI 0.954 to 1.397). This implies that the average change in the number of first author publications is nearly zero even for a large shift in the GRE quantitative percentile. However, the data support changes of approximately [-1 to +1] publication. While not exactly zero, these limited data clearly support the hypothesis that there is only a very minor relationship, if any, between publication and GRE scores. In fact, for verbal scores we observed a very small relationship (not statistically significant from zero) indicating that there is essentially no association in these data. Similar findings can be observed for the other outcome metrics presented here, including total and first author papers, fellowships obtained, time to degree, and faculty evaluations at exit. Importantly, we did observe a statistically significant relationship in the opposite direction with GRE and ranking (better ranked individuals tended to have poorer GRE scores). So, while the overall sample size is small, there is enough precision (or power) in these data to rule out strong meaningful associations if they existed. We have evaluated verbal and quantitative GRE scores separately in this study, but in actuality a student's application contains both scores. Perhaps a very low score in one domain (Q or V) may be offset by a high score in the other. In fact, most of the students in this cohort had two reasonably comparable Q and V scores. Of the 28 students, only four had a percentile spread between their two scores of greater than 30. In other words, they were generally either poor test takers or strong ones. Furthermore, only eight of the students who completed PhDs had both GRE-Q and GRE-V scores above the 50 th percentile, making it questionable whether the other 20 would have gained admittance to a graduate program that adhered to higher expectations for GRE performance. Five of the 28 students had neither GRE-Q or GRE-V scores above the 30 th percentile. We think it unlikely that they would be offered admission to most graduate programs at the time or even to many programs today. Yet, among this group of five is the student who garnered the best (lowest score) faculty evaluation. These outcomes underscore the benefit of giving letters of recommendation, personal statements, and interviews far more weight than GRE scores in making admissions decisions. Our GRE-tolerant approach for increasing the number of students from historically underrepresented groups completing PhDs has been highly successful.
Because all of the students whose performance outcomes are presented here were part of our IMSD program, one concern that can be raised is whether the extra support of this program in some way mitigates the impact low GRE scores would otherwise have. While our IMSD program provides academic support and mentoring to ensure student success, we assert that ensuring student success is of primary importance for all of our students. Indeed, we expect that all graduate programs are striving to provide the academic support and mentoring needed for student success, whether UR or non-UR, IMSD or non-IMSD. Viewing the IMSD program as a source of extra help actually misconstrues its purpose. The Vanderbilt IMSD program exists to build and sustain a community of historically underrepresented scholars to provide them with the social and emotional support needed to navigate our majority-white environment. These students experience stereotype threat, imposter syndrome, and implicit biases/microaggressions that impair their sense of belonging and result in high levels of stress. Sometimes this initially manifests itself in underperformance, leading to the erroneous view that historically underrepresented students need extra help. In fact, providing a sense of community and building self-efficacy is what is needed, not extra help, to empower historically underrepresented students to perform to their full potential. Due to stereotype threat and imposter syndrome, URM students may be less likely to seek academic help and, if few in number, form study groups than their non-URM peers. Similar to many programs that promote diversity, we provide intentional opportunities for URM students to gather at journal clubs, data clubs, and review sessions to ensure that students access any help they need, especially through peer mentoring. With social events added to this mix, the cohort becomes invested in the success of all members. Clearly, the sense of community and belonging created by the IMSD program applies to all historically underrepresented students, regardless of their GRE scores.
One reason we have restricted our analysis to IMSD students is because across the range of GRE scores, they all experience the same environmental challenges as non-majority students and have access to the same programmatic supports. However, given the IMSD program is NIH-funded, we have selected a subset of faculty to participate as eligible IMSD mentors based on their mentoring competency. It is possible that the success of the IMSD students across the wide range of GRE scores is due to the strong mentoring they received from their faculty dissertation mentors. Perhaps with good mentoring, GRE scores do not matter. However, to argue that GRE scores are needed to select students because we want them to be successful if they receive sub-optimal mentoring, does not seem like an attractive argument for keeping this standardized test. It may though mean that if graduate programs want the most successful students, they need to focus on faculty mentoring competencies as much, if not more so, than selection criteria for admissions.
Some programs are concerned that without the GREs, less prepared students could be admitted. Understandably, programs do not want to set students up to fail. The student deficit model is a common lens, that students who are not doing well have a lack of skill and/or knowledge that needs to be remedied. If our experience with URM students across the range of GRE scores is any guide, then focusing principally on student competences that need to be improved may prove inadequate without careful attention not only to faculty mentoring skills but also to the culture or environment that students find themselves in. If students are dealing continually with microaggressions and questioning every day if they belong, then anxiety will take its toll on performance. Academic help alone will not solve this problem. This applies whether you bring diversity of gender, gender expression, racial/ethnic, first generation, disadvantaged economic status, or disability to your training environment. There is growing recognition that diversity without inclusion is not enough [5], and that the benefits of diversity in fact depend upon inclusion [6]. Like most, if not all institutions, a truly inclusive climate, although of utmost importance, remains an aspirational goal for us. It is important to consider whether the use of GRE scores actually contributes to the implicit bias, imposter syndrome, microaggressions, and stereotype threat that impair student success. It is well established that historically underrepresented groups on average have lower GRE scores [7,8]. Thus, even students with high GRE scores, but whose visual identity matches groups who historically have lower GRE scores, are at risk of being labeled as low performing when it is erroneously assumed that their GRE scores are probably low. This type of bias is very hurtful, no matter how supportive a community of underrepresented scholars you build. Eliminating GRE scores from the admissions process can help remove a source of bias and thus promote a more inclusive training environment.
The relationship between objective test scores and performance has been a subject of debate for many years [7][8][9][10][11][12][13]. In addition to the concerns already described, uncertainty surrounding their predictive ability must be weighed against the cost imposed on applicants to take the test, and the advantages available to a subset of applicants who can prepare extensively ahead of time and/or take the test multiple times to obtain the desired high scores. However, the outcomes of the cohort presented here indicate that non-quantitative measures (letters of recommendation, personal statements, interviews) are capable of selecting successful PhD candidates, even when those candidates have extremely low GRE scores. Subjective measures have their own drawbacks, and we sought to minimize these by having multiple, experienced readers of graduate student applications. We attempted to mediate individual biases by including multiple diverse viewpoints of each student's potential in reaching a decision to offer admission. Admittedly, this process is time consuming, but the decision of who to train as the next generation of PhD scientists is also arguably one of the most important we make.
The "GRExit" movement is growing, and for those biomedical programs that remain undecided, the data here may be helpful in arriving at a decision on whether or not to continue to require GRE scores for admission. However these decisions turn out, we assert that our GREtolerant approach (no score too low) undoubtedly opened doors of opportunity for PhD training at Vanderbilt that may have otherwise remained closed for historically underrepresented students with very low GRE scores. The increased diversity they bring to the community of PhD biomedical scientists will be a benefit for decades to come.
Supporting information S1