Gender gaps in Mathematics and Language: The bias of competitive achievement tests

This research paper examines the extent to which high-stakes competitive tests affect gender gaps in standardized tests of Mathematics and Language. To this end, we estimate models that predict students’ results in two national standardized tests: a test that does not affect students’ educational trajectory, and a second test that determines access to the most selective universities in Chile. We used data from different gender twins who took these tests. This strategy allows us to control, through household fixed effects, the observed and unobserved household characteristics. Our results show that competitive tests negatively affect women. In Mathematics, according to both tests, there is a gender gap in favor of men, which increases in the university entrance exam, especially for high-performance students. As the literature review shows, women are negatively stereotyped in Mathematics, so this stereotype threat could penalize high-achieving women, that is, those that go against the stereotype. In Language tests, women outperform men in the standardized test taken in high school, but the situation is reversed in the university entrance exam. From our analysis of Chilean national data, we find no evidence that the gender effect observed in the competitive test depends on the students’ achievement level. Following the literature, this gender gap may be linked to women’s risk aversion, lower self-confidence, lower preference for competition, as well as the effect of answering a test under time pressure.

Students take the tests analyzed at different educational levels; the competitive test is taken in grade 12 th and the non-competitive test is taken in grade 10 th . To corroborate that the observed differences in performance are due to the characteristics of both tests (competitive and noncompetitive) and not to a temporal trend, we perform a more detailed analysis. In particular, we estimate the same three models from the main text for Language and Mathematics, but instead of using PSU and 10 th grade SIMCE scores as outcome variables, we use 10 th and 12 th grade students' school grades. If the differences observed between the PSU and 10 th grade SIMCE tests are due to a temporal trend, we should observe the same trend in the comparison of the 10 th and 12 th grade students' school grades between men and women. Table S11 shows the estimations for the models that predict mathematics school grades in 10 th and 12 th grade. Note that the variable for women is not significant for any high school grades. Additionally, the magnitude of the coefficients is always positive and higher in 12 th grade. Therefore, we find no evidence that men are improving their performance faster than women.

Gender Gap in Mathematics
Moreover, we find limited evidence of an interaction effect between gender and previous performance. In both estimations of Model (2), men as well as women have higher grades when they have had higher previous SIMCE test scores. Furthermore, the differences between men and women, both in 10 th and 12 th grade, are not statistically significant (p-values =0.24 and 0.30, respectively). For Model (3), when comparing the results for men and women according to their achievement group (Table S12), there are only statistically significant gaps in the school grades of 10 th grade for the lowmedium achievement group, where women have higher grades compared to men. Therefore, differences between the PSU and 10 th grade SIMCE tests are not caused by a learning trend where men are improving at a faster rate than women.  SIMCE and PSU variables were standardized to a distribution with mean equal to zero, and standard deviation equal to 1-standardization made by cohort.
2 School grades were standardized to a distribution with mean equal to zero, and standard deviation equal to 1-standardization made by cohort and school. 3 Students with university entrance expectation is a dummy variable. Its value equals 1 when the student in 10 th grade expected to attend the university, and 0 otherwise. * p<0.05, ** p<0.01, *** p<0.001. Estimations include household fixed effects.

Gender Gap in Language
Additionally, Table S13 shows the results of the models that explain 10 th and 12 th grade school grades in Language. In both cases, the variable for women has a positive and statistically significant effect. Moreover, the results for model (2) indicate that the effect of the previous SIMCE score on students' school grades in the 10 th and 12 th grade is not statistically different between men and women (p-value =0.53 and 0.31, respectively).
Finally, Table S14 uses model (3) to compare the coefficients for men and women in each achievement group. We find that, with a 95% confidence level, there are no gender gaps in any of the school grades in the medium-low performance group. For the medium-high achievement group, the gender gap in school grades for 10 th grade is not statistically significant, but it is significant for 12 th grade. In particular, women in the medium-high group have higher 12 th grades when compared to men who are in the same performance group. In contrast, in the high and low performance groups, in 10 th grade, there is a gap in favor of women, which is not significant in 12 th grade. Thus, it is possible that in the lowest and highest performing groups, men are able to improve their school grades between the 10 th and 12 th grade at a higher rate than women. We found the opposite effect for the highperforming group at the aggregate level, where women have higher grades in Language in 12 th grade compared to men (Model (1) in S13).
In aggregate terms, it seems unlikely that the observed differences between 10 th grade SIMCE and PSU scores are explained by a temporal trend in which men or women improve their performance at a different rate. School grades were standardized to a distribution with mean equal to zero, and standard deviation equal to 1-standardization made by cohort and school.
3 Students with university entrance expectation is a dummy variable. Its value equals 1 when the student in 10 th grade expected to attend the university and 0 otherwise. * p<0.05, ** p<0.01, *** p<0.001. Estimations include household fixed effects. 0