Women 1.5 Times More Likely to Leave STEM Pipeline after Calculus Compared to Men: Lack of Mathematical Confidence a Potential Culprit

The substantial gender gap in the science, technology, engineering, and mathematics (STEM) workforce can be traced back to the underrepresentation of women at various milestones in the career pathway. Calculus is a necessary step in this pathway and has been shown to often dissuade people from pursuing STEM fields. We examine the characteristics of students who begin college interested in STEM and either persist or switch out of the calculus sequence after taking Calculus I, and hence either continue to pursue a STEM major or are dissuaded from STEM disciplines. The data come from a unique, national survey focused on mainstream college calculus. Our analyses show that, while controlling for academic preparedness, career intentions, and instruction, the odds of a woman being dissuaded from continuing in calculus is 1.5 times greater than that for a man. Furthermore, women report they do not understand the course material well enough to continue significantly more often than men. When comparing women and men with above-average mathematical abilities and preparedness, we find women start and end the term with significantly lower mathematical confidence than men. This suggests a lack of mathematical confidence, rather than a lack of mathematically ability, may be responsible for the high departure rate of women. While it would be ideal to increase interest and participation of women in STEM at all stages of their careers, our findings indicate that if women persisted in STEM at the same rate as men starting in Calculus I, the number of women entering the STEM workforce would increase by 75%.


Fig 1 derivation
The data used in Fig 1 come from a collection of national reports on STEM participation at various milestones, as shown in Table 1 in the manuscript. We began with the number of 4th grade boys and girls in 1999 [1], which was 2,182,000 and 2,025,000 respectively. The National Science Foundation reports that in 2005, 68% of 4th grade boys and 66% of 4th grade girls are interested in science, leaving 1,483,760 boys and 1,336,500 girls [2]. This was the closest projection found for such numbers.
Individuals who completed 4th grade in 1999 would have been entering their senior year in high school in 2007. The census reports that there were 2,149,000 men and 1,998,000 women enrolled in 12th grade in 2007 [3], and the National Center for Education Statistics (NCES) reports 70% of 12th grade men (1,504,300) and 59% of 12th grade women (1,178,820) were interested in science in 2008 (again, the closest projection found) [4].
The census reports that in 2008, there were 2,132,000 men and 2,457,000 women enrolled as first-time freshmen in college [5]. The NCES reports that in 2009, 58% of male college students were enrolled in a 4-year college and 56% of female college students were enrolled in a 4-year college [6]. Thus, 58% of the 2,132,000 undergraduate freshmen men (1,236,560) were at a 4-year college, and 56% of the 2,457,000 undergraduate freshmen women (1,375,920) were at a 4-year college. The American Freshmen report indicates that in 2008, 32.3% of freshmen men and 17.2% of women enrolled in a STEM major [7]. This number includes all biological science, engineering, Physical science, and technical majors. This leads to 399,409 men and 236,658 women planning to study a STEM field in 2012.
The NCES reports that 196,763 men and 106,005 women earned Bachelor's degrees in a STEM field in 2012 [8]. This number includes the biology, computer science, engineering, mathematics, statistics, and physics degrees. The census reports that only 26% of STEM graduates go on to jobs in the STEM workforce [9], and that only 25% of these jobs are held by women [10]. This leaves a projection of 59,040 men entering the STEM workforce and 19,680 women.

Data preparation
Switcher coding. Students were coded as a Switcher, Persister, or neither based on their responses to four questions. On the beginning of term survey, students were asked if they intended to take Calculus II, with options "yes", "no" or "I don't know." On the end of term survey, students were asked if, at the beginning of the term, they intended to take Calculus II, with options "yes", "no", or "I don't remember" (referred to as "End of term; reflect"). They were also asked if they currently intended to take Calculus II, with options "yes", "no", or "I don't know." On the follow up survey on year later, students were asked if they had already taken or were currently enrolled in Calculus II, with options "yes" or "no." Because not all students answered all surveys, there were multiple ways that we identified students as Switchers, Persisters, or neither. Switchers were identified as any student who provided evidence of decreasing their intentions to take Calculus II. There were 11 unique ways that students could be identified as a Switcher, as shown in S1 Table. Student responses to each of the four questions are filled in as "Y" for "yes", "N" for "no", "M" for "I don't know" or "I don't remember", "NA" for not answered, and blank for any option. Thus, in Switcher group 1, students answered "yes" to the beginning of term question, answered anything to the end of term survey questions, and answered "no" a year later. Thus, this group of students entered Calculus I intending to take Calculus II and a year later had not taken Calculus II. Students in Switcher group 8 were initially unsure whether they would take Calculus II, marking "I don't know". However, this uncertainty suggests they were at least interested or open to taking more calculus. By the end of the term they said that they did not intend to take Calculus II, and thus they decreased their STEM interest and/or intention.
Students whose responses are not captured in S1 Table were determined to not be initially interested or intending to take Calculus II.
Career choice grouping. On the beginning of term survey students were asked to indicate their intended career choice, choosing from one of 16 options, shown in S2 Table. These intended careers were grouped together into five groups. The first group is comprised of traditional STEM degrees, excluding engineering, the second group. We chose to exclude engineering because there were a disproportionate number of engineering students compared to the other STEM fields. The third group of career intentions is made up of medical and other health professionals. The fourth group is made up of traditionally non-STEM careers, including non-STEM education, social scientists, business, law, humanities, and other non-science related career. The final category is Undecided.
Rather than restrict our analysis to students who indicated that they are intending to pursue a career in STEM, we included all students who indicated that they were at least open to taking more calculus, and thus were either STEM-intending or STEM-interested.

Reports of instruction.
On the end of term survey, students were asked two multi-part questions related to instructional practices (see S1 Fig) . Both questions were on a 6-point scale. For the first set of questions, 1 indicated strongly disagree and 6 indicated strongly agree. For the second set of questions, 1 indicated not at all, and 6 indicated very often.
Informed by principal components analyses (PCA), the student reports of the 16 practices were combined to create two aggregate variables called Instructor Quality and Student-Centered Practices. Instructor Quality is intended to represent aspects of instruction relate to good teaching, regardless of instructional approach, while Student-Centered Practices quantifies the instructional approach, focusing on the extent of instruction beyond traditional, lecture-based instruction.
The eight practices from question 18 (see S1 Fig) were averaged based on the PCA loadings to create a new variable called Instructor Quality, with the last prompt (my instructor discouraged me from wanting to continue taking Calculus) reverse coded. Thus, for this new variable a 1 indicates low levels of Instructor Quality and 6 indicates high levels. The loadings from the PCA are shown in S3 Table. In order to create an aggregate variable ranging from 1-6 like the original questions, the PCA loadings were rescaled to sum to one. Rescaling the loadings in this way allows the regression coefficient in the regression analysis to quantifies the change in the propensity for a student to switch out of calculus for a one integer increase in the reported Instructor Quality score. Since the loadings are relatively similar across questions, the new variable can be viewed as the average response of each student to the eight questions.
The eight practices from question 19 (see S1 Fig) were similarly averaged according to their PCA loadings to create a new variable called Student-Centered Practices. Again, for this new variable a 1 indicates low levels of Student-Centered Practices and 6 indicates high levels. S4 Table shows the loadings from the PCA, as well as the rescaled loadings. Since the loading on the lecture question was originally negative, it was recoded so that 1 represents frequent lecture and 6 represents very little lecture. Most of the loadings are similar, except that related to the frequency of instructors showing how to work specific problems and lecturing. Thus, the aggregate variable is effectively an average of the remaining six questions related to instructional approach.
S8 Table shows a summary of student responses on these aggregate variables.
Mathematical preparation. To measure students' mathematical preparation, we use their previous calculus experience and their math standardized test percentile. We group previous experience in calculus into three bins: high school (non-AP, AP AB, or AP BC), college, and none. For standardized test score, students were asked to report their SAT math test score and/or their ACT math test score. Using the college board website and the ACT website 1 reports of percentiles, we converted these scores to an aggregate "standardized math test percentile". For students who reported both, we used the average of their percentiles. See S6 and S7 Tables for summaries of these variables.

Descriptive analysis of switchers
S5-S8 Tables summarize the relationships between student covariates, switcher code, and gender for the 2,266 students for which we had complete data and were included in the logistic mixed-effects model analysis.

Statistical analysis of switchers
A logistic regression model was used to quantify the association between student characteristics and the propensity for students to switch out of the mainstream calculus sequence. Let Y i be an indicator of whether student i is coded as a switcher (i.e. 1 = Switcher, 0 = Persister). The mixed-effects logistic regression model can be expressed as follows: Student standardized test percentile, previous calculus experience, career goals, course teaching perceptions, and gender are treated as fixed effects and institution is treated as a random effect. The last term in the model expression α Institutioni represents the institution effect, capturing correlation among students at the same institution, and is assumed to have been generated from a normal distribution. Career goals, previous calculus experience, gender and institution are categorical variables, while standardized test percentile, Instructor Quality and Student-Centered Practices are continuous. The notation 1[A] in the model equation denotes an indicator function which equals one if the argument A is true and zero otherwise. For example, 1[CollegeCalc i ] is equal to one if student i has previous college calculus experience and a zero otherwise. For the categorial variables previous calculus experience, career choice, and gender, one category had to be selected as the baseline category. The categories high school calculus experience, STM career, and male were specified as the baseline categories. The β coefficients corresponding to the categorical variable are interpreted as the change in the odds of switching for each category compared to the respective baseline category. For example, β 1 is the multiplicative change in the log odds of a student switching out of calculus if they have previous college calculus experience compared to previous high school calculus experience. The non-categorical variables, which included standardized test percentile and the aggregate variables instructor quality and student-centered practices, were centered to have mean zero so the baseline student entered Calculus I with an average standardized test percentile and experienced average levels of Instructor Quality and Student Centered Practices.
A Bayesian estimation procedure was employed to estimate the parameters in the mixed-effects logistic regression model. All parameters are assumed to be independent in the prior distribution with the following parametric forms: The parameterization and specification of the hyperparameters of the gamma distribution were adopted from that described in [11]. Parameter estimates were obtained using Markov chain Monte Carlo (MCMC) via a Metropolis-Hastings sampling algorithm which samples from the posterior distribution of the parameters given the data (e.g., see [11,12]). This algorithm iteratively samples the vector of regression coefficients {β 1 , ...., β 10 }, vector of random effects {α 1 , ...., α K }, and random effects variance σ 2 assuming all other parameters are fixed. The Metropolis-Hastings proposal distribution for the regression coefficients is composed of independent normal distributions centered at the current coefficient estimates with a variance of 0.0003. The Metropolis-Hastings proposal for the random effects is also independent normal distributions centered at the current values with a variance of 0.008. These proposal variances result in acceptance rates of about 25% for the coefficients and 30% for the random effects (see [13,14] for discussion of optimal acceptance rates). Finally, σ −2 is updated from its full conditional (gamma) distribution. R code is provided at the author's webpage. The MCMC chain was run for five million iterations, with an additional five hundred thousand burn-in iterations. Samples from the posterior were collected every 1000th iteration to reduce dependence in the samples. This resulted in 5,000 samples from the posterior distribution of the parameters given the data, where the effective sample sizes for all parameters was greater than 987. The posterior mean parameter estimates resulting from the MCMC procedure were compared with the maximum likelihood estimates given by the 'glmer' function in the lme4 package in R [15], based on an adaptive Gauss-Hermite quadrature procedure with ten points per axis, and those from GLIMMIX in SAS. The estimates from all three estimation procedures were similar. The MCMC procedure was initialized with the estimates from 'glmer'.
To assess convergence of the Markov chain, we examined trace plots of the parameter samples (see S2 Fig), which suggest reasonable mixing of the chain. Summaries of the posterior distribution of the parameters are given in Fig 2 in the manuscript and S9 Table. The point estimates are the means of the posterior distribution and the 95% credible intervals were created from the 2.5th and 97.5th quantiles. The posterior mean estimate of the intercept (on the logit scale) and 95% credible interval are 0.09 and (0.06, 0.12). Also, the estimate and 95% credible interval for the variance of the institution random effects are 0.49 and (0.25, 0.86).