Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Exams disadvantage women in introductory biology

  • Cissy J. Ballen ,

    Contributed equally to this work with: Cissy J. Ballen, Shima Salehi

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Department of Biology Teaching and Learning, University of Minnesota, Minneapolis, MN, United States of America

  • Shima Salehi ,

    Contributed equally to this work with: Cissy J. Ballen, Shima Salehi

    Roles Formal analysis, Writing – original draft, Writing – review & editing

    Affiliation Graduate School of Education, Stanford University, Stanford, CA, United States of America

  • Sehoya Cotner

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Department of Biology Teaching and Learning, University of Minnesota, Minneapolis, MN, United States of America


The gender gap in STEM fields has prompted a great deal of discussion, but what factors underlie performance deficits remain poorly understood. We show that female students underperformed on exams compared to their male counterparts across ten large introductory biology course sections in fall 2016 (N > 1500 students). Females also reported higher levels of test anxiety and course-relevant science interest. Results from mediation analyses revealed an intriguing pattern: for female students only, and regardless of their academic standing, test anxiety negatively impacted exam performance, while interest in the course-specific science topics increased exam performance. Thus, instructors seeking equitable classrooms can aim to decrease test anxiety and increase student interest in science course content. We provide strategies for mitigating test anxiety and suggestions for alignment of course content with student interest, with the hope of successfully reimagining the STEM pathway as one that is equally accessible to all.


Women who enter college intending to pursue a science, technology, engineering, or mathematics (STEM) discipline leave in greater proportions than their male peers, and remain globally underrepresented in most STEM professions [13]. Explanations for the observed female attrition at the college level range from exposure to implicit and explicit bias [47], discrimination [5, 811], feelings of exclusion in the classroom [12], imposter syndrome [13] and a lack of role models [14, 15]. In addition to lower female retention rates [16], performance disparities between women and men are observed across STEM disciplines, including undergraduate biology [17], physics [1821], engineering [22], and math [23, 24]. The grade differential may result from female underperformance on exams, a phenomenon that can be explained in full or in part by increased risk perception or test anxiety that prevent some students from retrieving knowledge in an exam environment [25]. Notably, recent studies have verified the role of grade sensitivity in explaining gender imbalances: females students cite low grades and large gateway courses as reasons for declining interest in a discipline compared to male students in equivalent academic standing [26, 27]. If psychological barriers prevent women from performing optimally on exams, it may be time to reconsider exams as a primary method for evaluating student knowledge, particularly if exam performance is not connected to skills necessary for developing STEM professionals.

To explore what factors impact academic performance for women and men in introductory science courses, we addressed four questions: 1) What is the extent of the gender gap in incoming academic preparation among students? 2) What is the extent of the gender gap in exam grades and non-exam grades? 3) Do women and men report different levels of test anxiety and interest in science? 4) Do these two affective factors influence performance outcomes in undergraduate biology courses?

We hypothesized that we would observe men over-performing on high-stakes assessments (e.g., course exams) relative to women, but not on low-stakes summative assessments that contribute to final course grades (non-exam grades; e.g., written assignments, collaborative group work, quizzes). We also hypothesized that an inverse relationship exists between self-reported test anxiety and student performance. Finally, we hypothesized that test anxiety would have a stronger effect on exam performance compared to non-exam assessments.

To address our first and second research questions, we examined the relationship between student gender and (1) comprehensive scores on the American College Test (hereafter ACT), which evaluates high school students’ academic preparation for college coursework; (2) combined exam scores and scores on non-exam assessments that contribute to students’ course grades. To address our third and fourth questions, we collected affective measures including interest in science course material and test anxiety (constructs generated from the Motivated Strategies for Learning Questionnaire, or MSLQ; [28]). Using mediation analyses, we examined whether students’ incoming academic preparation (ACT) influences affective measures (test anxiety and interest in course material), which in turn impacts students’ academic performance (Fig 1). We tested whether this mediation effect varies across gender and assessment method.

Fig 1. Contrast partial and full mediation models to test mediation effects on student performance.

The partial model tests the partial mediation effect of science interest or test anxiety on students’ performance. In this model, ACT directly and indirectly via science interest or test anxiety affects students’ performance. The full mediation model tests how incoming preparation (ACT) affects student performance indirectly via science interest or test anxiety of students.

Materials and methods

Biology class preparation and performance

Demographic data were obtained from ten (minimum N = 90, maximum N = 239) biology courses sections taken by 1562 students (Table 1). We obtained ACT information for N = 1205 students (Table 2). We compared (1) combined multiple-choice exam grades; (2) combined non-exam grades e.g., discussion sections, laboratories, online activities, written assignments, low-stakes quizzes, as well as active learning in-class activities. We considered the raw scores of these two components, and then transformed them into z-scores, which represent the distance between the students’ raw score in a given component and the population mean of that component in units of standard deviation (e.g., Z is negative when the raw score is below the mean, positive when above). We calculated z-scores using the formula z-scores = (X - μ) / σ, where X is the score of interest, μ is the class mean score, and σ is the standard deviation.

Table 1. Descriptive summary statistics from ten introductory biology courses from fall 2016.

Table 2. The sample of students across ten introductory biology courses who took exams and either had a measure of prior demonstrated academic ability (ACT) that we could obtain from their records or did not have an ACT score.

Interest in course content and test anxiety.

Before the final exam, we used a validated affective survey to measure aspects of student motivation [28] in three sections of an introductory biology course. Of the 372 students enrolled in these three sections of BIOL 1003, 286 (77%) completed the post-course survey. These data represent 20% of the total students for whom we obtained performance information. Students reported responses using the following scale: 1 = Not at all true of me to 7 = Very true of me. We performed an exploratory factor analysis that resulted in two constructs designed to measure student anxiety during high stakes assessments and interest or perceived usefulness of course content.

For each of these constructs, we had adequate sampling to produce reliable results according to the Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy (KMO > 0.8). We used Bartlett’s test of sphericity to test for the presence of relationships among variables, which were significant for both factors (P < 0.001). Each was highly reliable according to a test for internal consistency (Cronbach’s alpha > 0.7; Table 3). For each construct, we generated a response variable for each student by combining their answers to the loaded questions in that construct using an additive scale. For all Likert scale analyses we treated the dependent variables as continuous [29].

Table 3. Data collected from a subset of students in the fall 2016 administration of affective surveys to three different undergraduate biology sections of BIOL 1003 at the University of Minnesota (N = 286).

Statistical analyses

What is the extent of the gender gap in academic preparation and performance?

We conducted a mixed-effect regression to examine the partial correlation between exam and non-exam grades, while controlling for the effect of gender. We then used a mixed-effects regression to predict the effects of gender on ACT, and to analyze predictors of students’ exam grade and non-exam grade (such as laboratory grade, homework assignments, low stakes quizzes, etc.). The data in this study are hierarchically nested, so we use multilevel modeling to account for this non-independence of data in nested-data structures [30, 31] such as lecture sections within course number (e.g., lecture section 10, 20, and 30 within BIOL 1003). We ran the analysis with and without students’ incoming composite ACT scores as a fixed effect, as a proxy for academic preparation. By reporting actual performance rather than model-based estimates that control for pre-scores, we show the actual achievement gaps and that male and female students are earning different grades.

Our research questions in this study mainly focus on the effect of students’ gender and incoming preparation on their performance. To address our research questions, we started with the basic regression model that predicted students’ performance by student gender identity (a factor with two levels; SGender); and their incoming preparation approximated by ACT score. To this basic model, we added the following fixed variables that may contribute to student performance: (1) race/ethnicity/nationality (analyzed as a two-level factor, based on whether a student is from an underrepresented minority [in STEM] group; URM.status); (2) an interaction between student gender identity and URM.status (SGender*URM.status); (3) class size (ClassSize); (4) student academic level (i.e. year in school). To determine the most appropriate model, we then used the Akaike's information criterion (AIC) as a multi-model inference technique [32]. Only students with a complete set of all variables were included in analyses. We ultimately chose the most parsimonious model that best fits the data in accordance to AIC model-selection statistics; this model includes composite ACT score and SGender, model 1 in Table 4.

Table 4. Best models for predicting composite exam grade using AIC model selection.

For non-exam grade, the model that best fit the data also included ACT and SGender, with the next best model including URM.status and ΔAIC = 1.722.

Do women and men report different levels of test anxiety and interest in science?

Using a subset of students who filled the MSLQ survey (N = 286), we performed statistical analyses on affective measures of interest in course science content (‘science interest’) and test anxiety using linear mixed-effects models with the gender and ACT score as the fixed effect and lecture section (BIOL 1003 section 1, 2, and 3) as a random effect. In these analyses, we have normalized the affective measure, so that the regression coefficients are easier to interpret for effect size.

Do affective factors influence performance outcomes?

The mediation analyses were conducted using Lavaan R package [33]. In mediation analyses, students’ ACT score affected academic performance through three different paths: one direct path and two indirect paths mediated by science interest and test anxiety (Fig 1). We examined which of these three paths were significant (S1 Appendix). The mediation analysis was conducted separately for exam performance and non-exam mixed assessments performance. To test whether the mediation effect of science interest and test anxiety were different across genders, we used the group analysis option in Lavaan, which allows the coefficients of mediation analysis to be different across gender. For the mediation analysis of both exam grade and non-exam performances, we compared the fit of partial and full mediation models (Fig 1). In the full mediation model, the effect of ACT score on performance is fully mediated by science interest and test anxiety, meaning that ACT score affects performance only indirectly by changing students’ science interest and/or test anxiety. In the partial mediation model, the effect of ACT score on performance is only partially mediated by science interest and/or test anxiety, implying that ACT score both affects performance directly, as well as indirectly by influencing students’ science interest and test anxiety. We found that for both exam performance and non-exam performance, the full mediation model did not fit the data well: The estimated co-variances of this model were significantly different from the actual co-variances in the data (Exam: χ2 (4, N = 221) = 72.253, P < 0.0001, Non-exam: χ2 (4, N = 221) = 16.918, P = 0.002). Also none of the other fit indices of the full mediation model fell within the acceptable range [root mean squared (RMSEA): Exam = 0.402, Non-Exam = 0.171 (acceptable range: less than 0.08); comparative fit index (CFI): Exam = 0.204, Non-exam = 0.474 (acceptable range: above 0.95); standardized root mean square residual (SRMR): Exam = 0.158, Non-exam = 0.082 (acceptable range: less than 0.08)][34]. However, the partial mediation model fit the data well for both exam and non-exam performance. The estimated co-variances of the partial mediation model were not significantly different from the actual co-variances in the data [for both exam and non-exam: χ2 (2) = 1.681, p = 0.431]. The other fit indices of the model were also within the acceptable range [for both exam and non-exam: root mean square error (RMSEA) = 0.000 (acceptable range: less than 0.08); comparative fit index (CFI) = 1.000 (acceptable range: above 0.95), standardized root mean square residual (SRMR) = 0.027 (acceptable range: less than 0.08)] (S1 Appendix). This partial model tests the direct effect of students’ ACT on their performance as well as its indirect effect mediated by the affective factors of science interest and/or test anxiety (Fig 1).


What is the extent of the gender gap in incoming academic preparation among introductory biology students?

We compared incoming ACT scores of female and male students using a mixed-effect regression model. This analysis revealed a significant difference between genders: ACT scores for women were, on average, 0.28 standard deviation lower than men (B = - 0.283, t (df = 1284) = 5.178, P < 0.0001, SE = 0.055).

What is the extent of the gender gap in exam grades and non-exam grades?

Across course sections, exam and non-exam grades of students were significantly and positively correlated (B = 0.387, t(1444) = 12.384, P < 0.0001, SE = 0.031), and this correlation was not significantly different across gender (B = 0.068, t(1444) = 1.420, P = 0.156, SE = 0.048). We also found that women underperform on biology exams compared to men (B = - 0.146, t(1446) = -2.773, P = 0.006, SE = 0.053), but receive higher non-exam grades than men (B = 0.296, t(1446) = 5.673, P < 0.0001, SE = 0.052). These results suggest that women’s exam scores on average was 0.15 standard deviation lower than men, and their non-exam scores were on average 0.3 standard deviation higher than men. When we included incoming ACT score in the model as a fixed effect, the gender gap in exam performance disappeared (B = -0.042, t(1200) = -0.867, P = 0.386, SE = 0.049), but women still received significantly higher non-exam grades than men (b = 0.297, t(1125) = 5.251, P < 0.0001, SE = 0.056). This means that after controlling for difference in students’ academic preparation, there was no difference between women and men’s exam performance, however women still achieve 0.3 standard deviation higher grades on non-exam assessments. These results suggest that the performance gap on exams in introductory biology can be explained by ACT performance. However, ACT performance does not explain the gender gap on non-exam grades, which show women outperforming men.

Do women and men report different levels of test anxiety and interest in science?

Across course sections, interest in course-specific science content and test anxiety were not significantly correlated (B = 0.124, t(272.9) = 1.580, P = 0.115, SE = 0.079), and this was not significantly different across gender (B = -0.088, t(273.83) = -0.73, P = 0.466, SE = 0.121). Furthermore, across course sections and after controlling for students’ ACT score, women reported on average 0.38 standard deviation higher interest in course-specific science content (B = 0.375, t(222) = 2.774, P = 0.006, SE = 0.135) and 0.43 standard deviation higher level of test anxiety (B = 0.425, t(220) = 3.092, P = 0.002, SE = 0.137) compared to men.

Do affective factors influence performance outcomes?

We showed that the observed difference in exam scores between women and men is due to women’s lower incoming academic preparation. To explore the possibility that other variables mediate the effect of incoming preparation on exam performance, we used mediation analyses. Mediating variables transmit effects of an independent variable on a dependent variable, illustrating their structural relationships [35, 36]. We were interested in the mediating effect of affective measures such as science interest and test anxiety as they transmit the effect of incoming preparation on exam performance for women and men (Fig 1).

Exam grades.

A partial mediation model revealed a correlation between ACT score and academic performance for all students, confirming previous research that demonstrates the same trend [37]. The direct effect of ACT score was stronger on students’ exam grades than non-exam grades. This observation is reasonable because exam performance (and the associated gender gap) mirrors students’ performance on the ACT, which is itself a high-stakes assessment similar to exams. For women, one standard deviation increase in ACT score increased exam grade by 0.55 standard deviation (P < 0.0001), and 0.41 standard deviation for men (P < 0.0001).

We found non-significant indirect effects of ACT on exam grades for female or male students, though for different reasons (Table 5). For women, ACT score did not correlate with interest in science or test anxiety (science interest P = 0.59; test anxiety P = 0.15). However, science interest and test anxiety both significantly correlated with exam grades; one standard deviation increase in science interest increased women’s exam grade by 0.16 standard deviation (P = 0.02); one standard deviation increase in test anxiety decreased women’s exam grade by 0.22 standard deviation (P = 0.001; Fig 2). For men, ACT score was correlated with test anxiety (P = 0.011), with one standard deviation increase in ACT decreased men’s test anxiety by 0.3 standard deviation. However, decrease in test anxiety did not affect exam performance (P = 0.82; Fig 2).

Fig 2.

Partial mediation analyses show differences in the significant effects of incoming preparation (ACT) on exam grade and non-exam grade for (A) female and (B) male students. Red arrows depict negative effects and blue arrows show positive effects. ACT has direct, positive effects on exam (left) and non-exam (right) grades for all students. For female students, ACT does not influence affective measures such as science interest and test anxiety, but these affective measures influence exam and non-exam grades. For male students, ACT negatively affects test anxiety, but test anxiety does not in turn influence exam and non-exam grades. * p < 0.05, ** p < 0.01, *** p < 0.0001.

Table 5. Summary of partial mediation analysis for exam performance.

The numbers in parentheses represent standard errors. Interest, test anxiety, ACT, and exam scores are normalized for ease of interpretation.

Non-exam grades.

The partial mediation model shows that the direct effect of ACT scores on students’ non-exam grades was significant for both women and men. For women, one standard deviation increase in ACT score directly increased women’s non-exam grade by 0.16 standard deviation (P = 0.004), and 0.31 standard deviation for men (P = 0.005).

Similar to exam grades, the indirect effects of ACT score on non-exam grades was not significant for female or male students (Table 6). For women, test anxiety significantly correlated with non-exam grades; one standard deviation increase in test anxiety decreased the non-exam grade by 0.13 standard deviation (P < 0.0001). Science interest was not a significant predictor of non-exam grade (P = 0.63). For men, the non-exam grade was not correlated with either test anxiety (P = 0.96), or with science interest (P = 0.43), and thus not correlated with ACT score through either affective measure (Fig 2).

Table 6. Summary of partial mediation analysis for non-exam performance.

The numbers in parentheses represent standard errors. Interest, test anxiety, ACT, and non-exam scores are normalized.


Using student data from ten introductory biology course sections in fall 2016, we demonstrate that women underperformed on ACT and exams as compared to their male counterparts, but outperformed men on combined non-exam methods of assessment. Mediation analyses revealed two further findings: for men, ACT score was not correlated with science interest, and science interest did not influence exam grade. For women, however, ACT score was not correlated with science interest, while science interest significantly influenced exam performance. Second, for men, though ACT score was correlated with test anxiety, test anxiety did not influence exam grade. For women, ACT scores did not correlate with test anxiety, but test anxiety significantly influenced exam grade (Fig 2).

Our results suggest that instructor efforts to design curricula that promote students’ interest can positively impact exam performance, particularly for women. Furthermore, these efforts will benefit female students regardless of their incoming preparation. Previous research shows that encouraging students to connect course material to their lives increases interest and performance in science courses early in high school [38]. Gender differences in attitudes towards and interest in science [39] means that making course content personally relevant for both women and men might be a challenging task. However, these efforts are particularly important in male-dominated academic areas (e.g., math, physics, or engineering) where women are underrepresented and more likely to consider changing their major [40].

Our results also show one measure of academic preparation, ACT score, accurately predicts test anxiety for men in college. However, for women, this prior demonstrated competency do not predict test anxiety. In addition, for women only, increasing test anxiety has a significant and sizeable negative impact on exam performance: one standard deviation increase in test anxiety decreases the exam grade by 0.28 standard deviation. This effect is almost half the size of that for incoming preparation: one standard deviation increase in ACT score increases the exam grade by 0.55 standard deviation. Our findings underscore the likelihood that performance during high-pressure testing may not reflect actual content knowledge for some underrepresented groups [41, 42]. For women, test anxiety may stem from social psychological barriers such as stereotype threat [40, 43, 44], whereby in high-stakes testing situations (i.e. high-value course exams) females experience a self-evaluative apprehension of conforming to the perceived stereotype of female inferiority in STEM subjects.

If test anxiety, coupled with stereotype threat, is culpable in the underperformance of women on high-stakes exams, efforts to minimize threat during exams should reduce the gender differences we, and others, have documented in STEM disciplines. This hypothesis and associated predictions are testable and, if our predictions are correct, the actionable items are simple to implement. Instructors could minimize the impact of high-stakes tests by offering a diversity of assessment types in their courses. For example, active learning is defined in part by its use of formative and summative assessment methods, and evidence for performance gains in active-learning environments is compelling and broad [4548]. Techniques vary, but active learning can include group work, case studies, modeling exercises, and a diversity of in-class assessment techniques (e.g., classroom response systems, Immediate Feedback Assessment Technique forms, worksheets, and one-minute papers). Incentivizing students to participate through mixed methods of assessment rewards consistent, ongoing preparation rather than performance on a few high-stakes examinations. We hypothesize that mixed assessment methods in active learning classrooms serve as relatively nonthreatening opportunities for females and others to demonstrate knowledge under minimized susceptibility to test anxiety, thus increasing females’ overall performance. In this study, the negative effect of test anxiety on females’ performance was twice as high for exams as it was for non-exam grades (Table 6). Thus, incorporation of mixed assessment methods may be particularly beneficial in male-stereotyped STEM fields where women are a minority in the classroom and suffer the largest susceptibility to stereotype threat in test environments [43, 49].

One limitation of this study that may influence the interpretation of our findings is the possibility that our survey instruments functioned differently for the different groups of students we sampled. While we examined exclusively non-majors’ introductory biology lecture courses, we still observed course-specific differences in classroom demographic composition and student preparation (Table 1). As future research broadens in scope to examine student populations across STEM fields, it will become increasingly important to compare different groups of students’ responses to survey instruments. We may also expect that the performance impacts of test anxiety and course interest will change based on discipline.

Although our emphasis is on differential performance as a function of gender, we anticipate similar phenomena may characterize the experiences of underrepresented minority students, first-generation college students, and any student more susceptible to test anxiety in high-stakes exam environments. The traditional learning environment is not designed for a diverse student body and does not recognize student variation on many dimensions of learning. Evaluating students based primarily on high-stakes exams does not nurture individual potential, and its use to assess our increasingly diverse talent pool will perpetuate existing disparities. Although many uncertainties remain, recent work is beginning to fill in some of the major gaps in our understanding of the effects of tests on underrepresented groups in STEM (e.g., see [50]). We now have plausible hypotheses about the forces responsible, not only with respect to underlying mechanisms [51], but also ways to develop curricula that promote performance of at-risk students [18, 4548]. The challenging task of robustly testing our hypotheses is still in its infancy, but recent progress is encouraging. Techniques to experimentally manipulate critical parameters (such as writing exercises or teaching with multiple low-stakes assessments) are feasible and should provide increasingly powerful methods to clarify the consequences of different types of assessments for all STEM students. Ultimately, fundamental changes in how we assess mastery in STEM courses may be critical for making the STEM disciplines accessible to all.

Supporting information


We thank Daniel Baltz for help with data organization and interpretation; Deena Wassenberg for help with data collection from students; J.D. Walker and Lauren Sullivan for statistical support; Carl Wieman, Dan Schwartz, Seth Thompson, and Robin Wright for helpful advice and comments on the manuscript. We obtained human subjects approval from the University of Minnesota Institutional Review Board (protocol number 1405E50826). Subjects were informed that a research study was taking place, and researchers complied with all relevant institutional guidelines. Students provided consent in an electronic survey, and data was accessed by the authors anonymously.


  1. 1. Beede DN, Julian TA, Langdon D, McKittrick G, Khan B, Doms ME. Women in STEM: A gender gap to innovation. Economics and Statistics Administration Issue Brief. 2011(04–11).
  2. 2. UNESCO. Gender and EFA 2000–2015: achievements and challenges. In: UNESCO, editor. Paris, France: United Nations Educational, Scientific and Cultural Organization; 2015.
  3. 3. Hale K, Burke A. Women, Minorities, and Persons with Disabilities in Science and Engineering: 2017 Arlington, VA: National Science Foundation; 2017. Available from:
  4. 4. Greenwald AG, Krieger LH. Implicit bias: Scientific foundations. California Law Review. 2006;94(4):945–67.
  5. 5. Moss-Racusin CA, Dovidio JF, Brescoll VL, Graham MJ, Handelsman J. Science faculty’s subtle gender biases favor male students. Proc Natl Acad Sci U S A. 2012;109(41):16474–9. pmid:22988126
  6. 6. West TV, Heilman ME, Gullett L, Moss-Racusin CA, Magee JC. Building blocks of bias: Gender composition predicts male and female group members’ evaluations of each other and the group. J Exp Soc Psychol. 2012;48(5):1209–12.
  7. 7. Hall RM, Sandler BR. The classroom climate: a chilly one for women? 1982. Washington, DC: Association of American Colleges and Universities.
  8. 8. Eagly AH, Mladinic A. Are people prejudiced against women? Some answers from research on attitudes, gender stereotypes, and judgments of competence. Eur Rev Soc Psychol. 1994;5(1):1–35.
  9. 9. Sheltzer JM, Smith JC. Elite male faculty in the life sciences employ fewer women. Proc Natl Acad Sci U S A. 2014;111(28):10107–12. pmid:24982167
  10. 10. Reuben E, Sapienza P, Zingales L. How stereotypes impair women’s careers in science. Proc Natl Acad Sci U S A. 2014;111(12):4403–8. pmid:24616490
  11. 11. Sugimoto CR, Lariviere V, Ni C, Gingras Y, Cronin B. Global gender disparities in science. Nature. 2013;504(7479):211–3. pmid:24350369
  12. 12. Lester J. Women in male-dominated career and technical education programs at community colleges: Barriers to participation and success. J Women Minor Sci Eng. 2010;16(1).
  13. 13. Clance PR. The impostor phenomenon: Overcoming the fear that haunts your success: Peachtree Pub Ltd; 1985.
  14. 14. Cotner S, Ballen C, Brooks DC, Moore R. Instructor gender and student confidence in the sciences: a need for more role models. J Coll Sci Teach. 2011;40(5):96–101.
  15. 15. Young DM, Rudman LA, Buettner HM, McLean MC. The influence of female role models on women’s implicit science cognitions. Psychol Women Q. 2013;37(3):283–92.
  16. 16. Gillen A, Tanenbaum C. Exploring gender imbalance among STEM doctoral degree recipients. Washington, DC: American Institutes for Research. 2014.
  17. 17. Eddy SL, Brownell SE, Wenderoth MP. Gender gaps in achievement and participation in multiple introductory biology classrooms. CBE Life Sci Educ. 2014;13(3):478–92. pmid:25185231
  18. 18. Lorenzo M, Crouch CH, Mazur E. Reducing the gender gap in the physics classroom. Am J Phys. 2006;74(2):118–22.
  19. 19. Miyake A, Kost-Smith LE, Finkelstein ND, Pollock SJ, Cohen GL, Ito TA. Reducing the gender achievement gap in college science: A classroom study of values affirmation. Science. 2010;330(6008):1234–7. pmid:21109670
  20. 20. Brewe E, Sawtelle V, Kramer LH, O’Brien GE, Rodriguez I, Pamelá P. Toward equity through participation in Modeling Instruction in introductory university physics. Phys Rev Phys Educ Res. 2010;6(1):010106.
  21. 21. Pollock SJ, Finkelstein ND, Kost LE. Reducing the gender gap in the physics classroom: How sufficient is interactive engagement? Phys Rev Phys Educ Res. 2007;3(1):010107.
  22. 22. Bell AE, Spencer SJ, Iserman E, Logel CE. Stereotype threat and women's performance in engineering. J Eng Educ 2003;92(4):307–12.
  23. 23. Cheema J, Galluzzo G. Analyzing the gender gap in math achievement: Evidence from a large-scale US sample. Res Ed. 2013;90(1):98–112.
  24. 24. Hyde JS, Mertz JE. Gender, culture, and mathematics performance. Proc Natl Acad Sci U S A. 2009;106(22):8801–7. pmid:19487665
  25. 25. Nguyen H-HD, Ryan AM. Does stereotype threat affect test performance of minorities and women? A meta-analysis of experimental evidence. Amer Psych Assoc; 2008;93(6):1314–1334
  26. 26. Rask K, Tiefenthaler J. The role of grade sensitivity in explaining the gender imbalance in undergraduate economics. Econ Educ Rev. 2008;27(6):676–87.
  27. 27. Barr DA, Gonzalez ME, Wanat SF. The leaky pipeline: Factors associated with early decline in interest in premedical studies among underrepresented minority undergraduate students. Acad Med. 2008;83(5):503–11. pmid:18448909
  28. 28. Pintrich PR, Smith DAF, Garcia T, McKeachie WJ. A manual for the use of the Motivated Strategies for Learning Questionnaire (MSLQ). Technical report no. 91-B-004. The Regents of the University of Michigan. 1991.
  29. 29. Norman G. Likert scales, levels of measurement and the “laws” of statistics. Adv Health Sci Educ Theory Pract. 2010;15(5):625–32. pmid:20146096
  30. 30. Paterson L, Goldstein H. New statistical methods for analysing social structures: an introduction to multilevel models. Br Educ Res J. 1991;17(4):387–93.
  31. 31. Kreft IG, Kreft I, de Leeuw J. Introducing multilevel modeling. Newbury Park, CA: Sage Publications; 1998.
  32. 32. Akaike H. A new look at the statistical model identification. IEEE transactions on automatic control. 1974;19(6):716–23.
  33. 33. Rosseel Y. Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA). Ghent, Belgium: Ghent University; 2012.
  34. 34. Schreiber JB, Nora A, Stage FK, Barlow EA, King J. Reporting structural equation modeling and confirmatory factor analysis results: A review. The Journal of educational research. 2006;99(6):323–38.
  35. 35. Hayes AF. Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Commun monogr. 2009;76(4):408–20.
  36. 36. Kline RB. Principles and practice of structural equation modeling. Guilford publications; 2015.
  37. 37. Westrick PA, Le H, Robbins SB, Radunzel JM, Schmidt FL. College performance and retention: a meta-analysis of the predictive validities of ACT® scores, high school grades, and SES. Educ Assess. 2015;20(1):23–45.
  38. 38. Hulleman CS, Harackiewicz JM. Promoting interest and performance in high school science classes. Science. 2009;326(5958):1410–2. pmid:19965759
  39. 39. Jones MG, Howe A, Rua MJ. Gender differences in students' experiences, interests, and attitudes toward science and scientists. Sci Educ. 2000;84(2):180–92.
  40. 40. Steele J, James JB, Barnett RC. Learning in a man's world: Examining the perceptions of undergraduate women in male-dominated academic areas. Psychol Women Q. 2002;26(1):46–50.
  41. 41. Croizet J-C, Després G, Gauzins M-E, Huguet P, Leyens J-P, Méot A. Stereotype threat undermines intellectual performance by triggering a disruptive mental load. Pers Soc Psychol Bull 2004;30(6):721–31. pmid:15155036
  42. 42. Martens A, Johns M, Greenberg J, Schimel J. Combating stereotype threat: The effect of self-affirmation on women’s intellectual performance. J Exp Soc Psychol. 2006;42(2):236–43.
  43. 43. Schmader T. Gender identification moderates stereotype threat effects on women's math performance. J Exp Soc Psychol. 2002;38(2):194–201.
  44. 44. Steele CM. A threat in the air. How stereotypes shape intellectual identity and performance. Amer Psychol. 1997;52(6):613–29.
  45. 45. Freeman S, Eddy SL, McDonough M, Smith MK, Okoroafor N, Jordt H, et al. Active learning increases student performance in science, engineering, and mathematics. Proc Natl Acad Sci U S A. 2014;111(23):8410–5. pmid:24821756
  46. 46. Beichner RJ, Saul JM, Abbott DS, Morse JJ, Deardorff D, Allain RJ, et al. The student-centered activities for large enrollment undergraduate programs (SCALE-UP) project. Research-based reform of university physics. 2007;1(1):2–39.
  47. 47. Eddy SL, Hogan KA. Getting under the hood: how and for whom does increasing course structure work? CBE Life Sci Educ. 2014;13(3):453–68. pmid:25185229
  48. 48. Haak DC, HilleRisLambers J, Pitre E, Freeman S. Increased structure and active learning reduce the achievement gap in introductory biology. Science. 2011;332(6034):1213–6. pmid:21636776
  49. 49. Spencer SJ, Steele CM, Quinn DM. Stereotype threat and women's math performance. J Exp Soc Psychol. 1999;35(1):4–28.
  50. 50. Shapiro JR, Williams AM, Hambarchyan M. Are all interventions created equal? A multi-threat approach to tailoring stereotype threat interventions. J Pers Soc Psychol. 2013;104(2):277. pmid:23088232
  51. 51. Owens M, Stevenson J, Hadwin JA, Norgate R. When does anxiety help or hinder cognitive test performance? The role of working memory capacity. Br J Psychol. 2014;105(1):92–101. pmid:24387098