Underrepresentation of women in computer systems research

The gender gap in computer science (CS) research is a well-studied problem, with an estimated ratio of 15%–30% women researchers. However, far less is known about gender representation in specific fields within CS. Here, we investigate the gender gap in one large field, computer systems. To this end, we collected data from 72 leading peer-reviewed CS conferences, totalling 6,949 accepted papers and 19,829 unique authors (2,946 women, 16,307 men, the rest unknown). We combined these data with external demographic and bibliometric data to evaluate the ratio of women authors and the factors that might affect this ratio. Our main findings are that women represent only about 10% of systems researchers, and that this ratio is not associated with various conference factors such as size, prestige, double-blind reviewing, and inclusivity policies. Author research experience also does not significantly affect this ratio, although author country and work sector do. The 10% ratio of women authors is significantly lower than the 16% in the rest of CS. Our findings suggest that focusing on inclusivity policies alone cannot address this large gap. Increasing women’s participation in systems research will require addressing the systemic causes of their exclusion, which are even more pronounced in systems than in the rest of CS.


Introduction
Women comprise a minority of the science and technology workforce, and the gender gap persists despite years of research and efforts to close it [1,2]. In computer science (CS) in particular, this gap carries significant societal effects, such as inequality in economic opportunities for women and an undersupply of researchers and engineers in the rapidly growing discipline [3,4]. The gender gap among researchers is particularly severe: the people who participate in research, publish about it, and have their research acknowledged for its value are predominantly men [5]. Numerous studies estimate that only about 15%-30% of the CS research community are women [1,[6][7][8][9]. Although some recent indications show these numbers could be growing, they remain low, and the rate of growth remains slow [2].
CS is an expansive and diverse discipline with different characteristics in each of its constituent fields [10]. Treating CS as one homogeneous area risks missing some of the gender disparity phenomena that show up more acutely in specific fields. In this paper, we focus on one such field, computer systems (or "systems" for short). Systems is a large research field with journals, sometimes years later [20]. The conferences we selected include some of the most prestigious systems conferences (based on indirect measurements such as Google Scholar's metrics), as well as several smaller or less-competitive conferences for contrast, shown in Table 1. To reduce time-related variance, we chose to focus on a large cross-sectional set of conferences from a single publication year. Our choice of which conferences belong to "systems" is necessarily subjective. Not all systems papers from 2017 are included in our set, and some papers that are in our set may not be universally considered part of systems (for example, if they lean more towards algorithms or theory). Nevertheless, we believe that our cross-sectional set is both wide enough to represent the field well and focused enough to distinguish it from the rest of CS. In total, our sample includes 2,225 peer-reviewed systems conference papers.
Because our metric for the gender gap counts the percentage of women among authors, we collected the names and author positions of all 9,906 authors (7,495 unique). Papers in our dataset average 4.45 coauthors per paper, and of the 1,871 papers with three or more coauthors, only 12.29% ordered the author list alphabetically. Papers in systems tend to list the primary contributor in the leading (first) position and senior authors last, so we examined the gender of first and last authors as well.
In addition to paper authors, we collected information on researchers in the following conference roles: • program committee (PC) chairs, who coordinate the review activities (112 total, 18 women, 94 men).
• PC members, who conduct most of the paper reviews and therefore have a direct influence on which papers get accepted (2,472 total, 412 women, 2,056 men).
• Keynote speakers (96 total, 16 women, 80 men). panelists (179 total, 33 women, 146 men). and session chairs (619 total, 105 women, 514 men). who have no direct influence on the population of authors, but represent the "face" of the conference to attendees. The visibility of women for such role models may have an indirect impact or appeal for women practitioners [12,21].
For this study, the most critical piece of information on these researchers is their perceived gender at time of publication [11]. Gender is a complex, multifaceted identity [22], but most bibliometric studies still rely on binary genders-either collected by the journal or inferred from forename-because that is the only designator available to them [1, 2, 6-9, 11, 23]. In the absence of self-identified gender information for our authors, we also necessarily compromised on using binary gender designations. We therefore use the gender terms "women" and "men" interchangeably with the sex terms "female" and "male". The conferences in our dataset did not collect or share specific gender information, so we had to collect this information from other public sources. Similar studies have typically used automated gender-inference services based on forename and sometimes country of origin [24,25]. These statistical approaches can be reasonably accurate for names of Western origin, and especially for male names [6,14,26].
We opted instead to rely primarily on a manual approach that can overcome the limitations of name-based inference. Using web lookup, we assigned the gender of 95.44% of the researchers for whom we could identify an unambiguous web page with a recognizable gendered pronoun or absent that, a photo. (For example, many Linkedin profiles may lack a photo, but include a gendered pronoun in the recommendations section.) For 2.1% others, we used genderize.io's automated gender designations if it was at least 70% confident about them [26]. The remaining 225 persons were not assigned a gender and were excluded from most analyses. This method provided more gender data and higher accuracy than automated approaches based on forename and country, especially for women [2,14,16,25,27]. This labor-intensive approach does introduce the prospect of human bias and error. For example, a gender assigned by an outdated biography paragraph with pronouns may no longer agree with the self-identification of the researcher. To verify the validity of our approach, we compared our manually assigned genders to self-assigned binary genders in a separate survey we conducted among 918 of the authors [28]. We found no disagreements for these authors, which suggests that the likelihood of disagreements among the remaining authors is low.
Conferences also do not generally offer information on authors' demographics, but we were able to unambiguously link approximately two thirds of researchers in our dataset to a Google Scholar (GS) profile (5,833 researchers, 64%). For each author and PC member, we collected all metrics in their GS profile, such as total previous publications (ca. 2017), h-index, etc. Note that we found no GS profile for 2,759 authors (36.75%), and these researchers appear to be less experienced than researchers with a GS profile. We therefore collected another proxy metric for author experience (total number of past publications) from another source, the Semantic Scholar database.
We also looked up each author's affiliation institute on GS to find their country of residence and work sector whenever they could be unambiguously inferred using hand-coded regular expressions. Many authors also included an email address in the full text of the paper, from which we inferred more timely affiliation and country information when available.
From authors' affiliations, we broadly categorized their work sector as either "COM" for industry (14% of all unique authors and PC members), "EDU" for academia, (79%), or "GOV" for government and national labs (7%).
In addition to researcher information, we gathered various statistics on each conference, either from its web page, proceedings, or directly from its chairs [29]. We collected data about review policies, important dates, the composition of its technical PC, and the number of submitted papers, among others. We also collected historical metrics from the Institute of Electrical and Electronics Engineers (IEEE), Association for Computing Machinery (ACM), and Google Scholar (GS) websites, including past citations, conference age in years, and total publications, and downloaded all 2,225 papers. Finally, from each conference's website and proceedings we collected information on any explicit policies the conference made to increase attendance diversity (Table 4), so that we could measure their effects, if any, on the gender gap.
The focus of this study is computer systems researchers, but to provide a more accurate picture of where this field stands in comparison to others in CS, we needed to collect additional information on non-systems conferences. We selected conferences in other CS fields from the same year, primarily based on their ranking on Google Scholar metrics as leaders in their respective fields ( Table 2). These conferences accepted papers from 12,202 unique authors. Because of the large manual effort involved in our approach for systems papers, we limited this data collection to genders and author positions for all non-systems authors. The gender collection methodology followed Chatterjee and Werner [30], first assigning genders to 8,709 authors using genderize. io's inference service when its probability of accuracy was at least 90%. For the remaining 3,331 authors, we looked up genders manually on the web as we have with systems conferences, leaving only 162 people for which we could not assign a gender manually or automatically. The overall gender statistics for these conferences are shown in Table 2, and the full details on this auxiliary dataset are available in the original study of that data [31].

Statistics
For statistical testing, group means were compared pairwise using Welch's two-sample t-test and group medians using the Wilcoxon signed-rank test; differences between distributions of two categorical variables were tested with the χ 2 test; and correlations between two numerical variables were evaluated with Pearson's product-moment correlation coefficient. All statistical tests are reported with their p-values. Mixed-effects logistic regression models were assessed with Satterthwaite's degrees of freedom method for hypothesis testing on model coefficients.

Ethics statement
The data collected for this study was sourced from public-use datasets such as conference and academic web pages. This study was exempted from the informed consent requirement by Reed College's Institutional Review Board (No. 2021-S26) under Exempt Category 4: the use of secondary data.

Limitations
Our study uses the FAR proxy metric to estimate women's participation in systems research, as do comparable studies estimating the gender gap in other fields [14][15][16]. FAR has been found to correlate tightly with gender ratios across disciplines [1]. Nevertheless, it is important to keep in mind that FAR may undercount women if men are more likely to submit papers or have them accepted.
We believe and demonstrate that the magnitude of this undercounting is small and insufficient on its own to explain the large gap with the overall CS statistics from past publications (which also use the same metric, with the same limitations). Table 2. Sampled set of non-systems CS conferences, categorized broadly into six fields, including number of accepted papers, total authors (nonunique), authors by gender, and ratio of female authors (sorting order). Gender data comes from generizer.io when at least 90% accuracy of prediction or manual Web search otherwise. The ratio of women among authors (FAR) excludes unassigned genders. In the literature, we found few controlled experiments that evaluate the peer-review process on both accepted and rejected papers, and they are typically limited in scope to a single conference or journal [32][33][34]. We chose an observational approach that allowed us to examine an entire field of study and produce metrics that are comparable with those in other fields. The main limitation of this approach is that it may miscount women if there is significant gender bias in the publication or review processes. Nevertheless, the resulting statistics are directly comparable to other studies employing the same approach. Moreover, our survey results indicate that such peer-review bias may be limited [28].

Field
Our methodology is also constrained by the manual collection of data. The effort involved in compiling all the necessary data limits the scalability of our approach to additional conferences or years. Furthermore, the manual assignment of genders is a laborious process, prone to human error. Nevertheless, such errors appear to be smaller in quantity and bias than those of automated approaches, as discussed previously.
Even with manual gender assignment, 2.16% of researchers still have unassigned gender. Although this ratio is small, and smaller than that of most other studies we reviewed, we nevertheless performed a sensitivity analysis to examine its effect. We artificially set the gender of all 225 unassigned researchers first to women, and then to men, and recomputed all statistical analyses. None of our findings were subsequently changed in either direction or statistical significance, which justified our decision to omit these missing data points from the analysis.

Women are underreprestened in author roles
We start with our first research question: estimating the actual ratio of women among computer systems researchers. With the data we collected on conference participants, we can compute the ratio of women in different conference roles: peer-reviewed authors, reviewers, and invited presenters (Table 3). We found that approximately 10.26% of published authors were women. Across the various other (invited) roles, women represent a weighted average of 17.83% of researchers.
Since 20.62% of authors are named in more than one paper, we compared counting each person exactly once to counting repeated occurrences of each person. With both counts, the gender ratios remain within a percentage point or so of each other. We also examined authorship outliers, because these can be linked with gender [24]. In our dataset, all authors with more than seven papers are men, and only 5 of the 97 authors with more than four papers are women. But removing all authors with more than four papers from our dataset would change women's underrepresentation by less than a percentage point. The effect of outliers on PC Table 3. Researcher count and ratio of women by role for systems conferences. Researchers are either aggregated by total appearances or identified uniquely, once per role. Lead authors in systems are typically the primary contributor and last authors are typically the senior member of the team. female representation is similarly small. We therefore decided to use the complete dataset of persons for the rest of this study, counting with repeats, as do comparable studies. The second-largest group of researchers, and the largest invited group, is that of program committee (PC) members. This group can also indirectly affect the representation of women among published authors, because PC members, through their reviews, decide which papers get published. The ratio of female PC members (FPR) is significantly higher than the ratio of female authors, [18.28% vs. 10.26%, χ 2 = 276.587, degrees of freedom (df) = 1, p < 10 −6 ]. The large difference in ratios raises the question: which of the two is more representative of women's true participation rate in systems research?

Role
We chose the typical bibliometric approach to estimate participation by gender, namely to look at published authors, or FAR [6,14]. This metric is not always accurate: it ignores researchers with limited access to publishing, and potentially undercounts female scientists because they tend to publish less than men in many fields [16,[35][36][37][38], possibly owing to a higher service load [39][40][41]. Confirming this past finding, women published only 1.27 papers in our dataset on average, compared to men's 1.34 (t = −2.74, df = 1124, p < 0.01). However, this � 5.7% difference is insufficient to explain the large discrepancy with gender representation in invited roles.
Unlike PC members, authors underwent blind and competitive peer review, averaging an acceptance rate of 25.5% in our dataset. This selection process is presumably more objective and less biased than one based on invitation [42]. If a biased review process allowed for a disproportionate number of women-authored papers to be published, it would mean that the gender gap in the author sample is not reflective of the researcher population as a whole, but that is not what we found. Mirroring studies from other fields that found no evidence of gender bias in the peer-review process [6,27,43], we found that women's papers were actually accepted at slightly higher rates when their identity was visible to reviewers (in 24 single-blind conferences) or when it was prominent in the first author position (11.1% of papers). An author survey also found that the reviews women received in the single-blind conferences in our dataset showed similar or higher grades than men's [28].
Contrariwise, our data suggests that it is the selection-by-invitation process that exhibits gender bias. Unlike women's underrepresentation in the editorial boards of many journals [44][45][46][47], in our dataset, women PC roles outnumber women author roles by some 75%. We hypothesize that this difference stems from an affirmative effort by conference chairs to bring gender closer to parity. This hypothesis, and our consequent reliance on FAR instead of FPR, are supported by three observations. First, if chairs are indeed oversampling women for PC roles, we would expect to see differences in experience statistics across genders. For example, chairs may have to search deeper in the researcher pool to recruit women to the PC, leading to lower research experience among women PC members, compared to their counterparts among men. Our data corroborates this prediction (Fig 1). For example, the mean (median) h-index of women PC members, 21.54, Second, if women are asked to serve on more PCs than men in relative terms, we would expect to find fewer unique women as PC members because of their repeated service [13], as Table 3 indeed confirms. This prediction is also corroborated by computing reviewer load, with 1.57 mean PC assignments (member and chair) per woman, compared to 1.41 per man (t = 3.28, df = 547, p < 0.01). Conceivably, the additional time committed to PC service explains some of the reduced publication rates we observed among women. However, authors who serve as PC members also tend to publish more papers (Pearson's r = 0.34, p < 10 −9 ), suggesting that a relative overrepresentation of women in PCs is not commensurate with underrepresentation among authors.
Finally, the smaller population size of PC members (n = 2,555) compared to that of authors (n = 7,507), magnifies statistical outliers. Therefore, conferences with uncharacteristic gender gaps introduce more variance to PC gender ratios than to those of authors. As shown in Fig 2, the gender gap for PCs exhibits a much higher variance and longer tail across conferences than for authors. Only two conferences show FPR values near parity, OOPSLA and ISPASS. Excluding this pair changes the mean FPR across the remaining conferences by -1.5 percentage points. Conversely, removing the two conferences with the lowest FAR values (HotI and VEE) only bumps up the mean FAR by 0.04 percentage points. Skewness in distribution therefore pulls the mean women ratios higher among PCs than it pulls it lower among authors, reaffirming our assertion that FAR is more reliable than FPR as an indicator of the overall gender gap.

Most CS fields have higher FAR than systems
The ratio of women among authors represents only a fraction of the ratio in the rest of CS, based on previous authorship studies that spanned the entire field. This gap surfaces the question of whether it stems from differences across CS fields or from differences in measurement.
To answer this question, we collected more gender data on non-systems conferences from the same year. Although our comparison data is necessarily constrained by the scalability of our manual collection approach, it still includes 16,971 nonunique authors from 19 of the top- cited non-systems CS conferences, based on GS metrics. Despite the breadth limitations of this additional dataset (not all conferences in all fields are represented), it should be directly comparable to the systems dataset, and large enough to produce statistically significant results. The data is also limited in depth, including only one year, but there is evidence that the underrepresentation of women in systems did not vary much across a five-year period including 2017, at least for the subfield of high-performance computing [48].
The results across fields are mixed, as expected ( Table 2). The fields of CS education and human-computer interaction exhibit the highest FARs, with the SIGCSE'17 conference approaching gender parity (43.98% FAR). The theoretical areas of CS exhibit the highest inequality, with the STOC'17 conference including only 13 women (4.47%) among its authors. The remaining three broad fields we evaluated show moderately higher FAR values than systems.
The overall FAR in the non-systems conferences we sampled was 16.46%, which is significantly higher than the systems-only FAR (χ 2 = 143.88, p < 10 −9 ) The ratio of women in CS across all systems-and non-systems authors in our dataset is 14.14%. This ratio is lower than most estimates for women in CS in previous studies, and we look at some possible explanations for this difference in the related work section. But it is still significantly higher than the FAR we found with comparable methodology in systems-conferences alone (χ 2 = 69.18, p < 10 −9 ).

Conference factors do not explain low FAR
The next step in understanding the gender gap is to look at the explanatory variables that may be associated with it, starting with conference-specific factors, and continuing to author-specific factors. FAR varies considerably from one conference to the next (minimum: 2.04%, maximum: 18.52%, mean: 10.26%, SD: 3.11%). Examining the differences between conferences could offer clues as to which factors might affect the gender gap. We first examine four major factors: the size of the conference, its double-blind review policy, its gender diversity among reviewers, and its specific diversity and inclusivity policies. We then explore the association (or lack thereof) between a conference's FAR and myriad other conference factors.
Conference size. Averaging the ratio of women by conferences, as opposed to by authors or papers (both computed in Table 3), could produce different results because smaller conferences receive the same weight as conferences with many more authors and papers. This choice does not appear to affect the gender gap in our dataset, as all three means are within 0.53% of each other, with the conference mean at the center of the other two. As shown in Fig 2, the ratio of women among authors appears to be independent of the size of the conference (papers published), as well as its double-blind review policy, and its ratio of female PC members. Statistically, there appears to be no correlation between a conference's size and its FAR (r = 0.03, p = 0.82).
Diversity across conference roles. One review policy often employed to increase participant diversity is to invite a more diverse reviewer body. For example, some studies have demonstrated gender homophily between reviewers and authors, leading to higher FAR values when more of the reviewers are women [51,52]. Women are again far from parity in the composition of most PCs in our dataset, but with higher variance than in the author body. Nevertheless, we found no correlation between higher FPR and higher FAR values (r = 0.04, p = 0.8).
We also looked at other visible conference roles: keynote speakers, session chairs, and panelists. However, the correlations between FAR and these roles reveal no such relationships here (r = 0.01, p = 0.97; r = 0.01, p = 0.93; and r = 0.03, p = 0.91, respectively).
In summary, inviting more women to visible conference roles and implementing diversityfocused policies likely contributes to more inclusive conferences [53,54], but is insufficient on its own to spontaneously add women authors to the field. Diversity initiatives. Some specific policies that have been proposed to increase diversity in conferences include: a designated inclusivity chair; a code of conduct or anti-harassment policy; special events and meetings to promote diversity; assistance with childcare during the conference; travel grants for underrepresented populations; and the collection and dissemination of diversity data [55][56][57]. Of our 53 conferences, 17 implemented at least one of these proposals ( Table 4), but that did not ostensibly lead to higher FAR values (9.86% mean FAR vs. 10.45% for the other conferences, t = −0.73, df = 44, p = 0.47).
As a prominent example, the only two conferences with an inclusivity chair, SC and ISC, ranked among the lowest conferences for FAR. It is possible that these policies were in fact more reactive than proactive, in an attempt to improve previous statistics. It is also possible that their effects can only be measured over several years. Regrettably, none of the conferences have been consistently sharing author demographics to evaluate changes over time, although a few release some data. The SC conference, for example, has been sharing demographic data since 2016. Throughout this period, women's attendance rate remained near constant at around 13%-14% (FAR was only shared for 2018 at 12%). ISC is another large conference that also employs various inclusivity initiatives, including naming a dedicated diversity chair and reporting attendee demographics. It does not report FAR, but we have manually computed FAR for the four years since 2017 in the range 5%-9%, lower than the average conference in our dataset.
It is plausible that inclusivity initiatives are only one of the selection criteria when choosing a conference to publish in, and that other criteria such as conference date, location, and subfield take precedence. For example, among the four computer architecture conferences in our set (ASPLOS, HPCA, ISCA, MICRO), all with similar acceptance rates, only ISCA offered any diversity initiative, but all four show similar FAR.
A venue's prestige has also been previously linked to the gender gap in publication. Examples include prestigious Mathematics journals that underrepresent women [58], novel research published by women that is less likely to be impactful [59], and men's tendency to self-cite more than women [60]. However, we found no direct correlations between a conference's prestige metrics and its ratio of women authors in computer systems.
Additional conference factors. In an attempt to uncover any nonobvious factors, we also collected various descriptive metrics on the different conferences and evaluated whether any of these metrics is associated with variations in FAR. These metrics could potentially uncover hidden relationships with gender representation, such as: the competitiveness of a conference, the number of authors it attracts, the composition of its PC, its history, and organizational factors.
As Table 5 shows, none of these associations appears to be significant. This finding was confirmed by building a combined linear model of a conference's FAR based on all of the factors we presented, where no coefficients turned out to be significant. It should be noted that many Table 4. Conferences with inclusivity initiatives, including diversity chair, code of conduct, special diversity events or workshops, assistance with childcare, travel grants for underrepresented minorities, and diversity data collection and publication. Conferences are ordered by increasing female author ratio (FAR). The last row summarizes the remaining conferences. of these factors are correlated, collinear, or connected by a confounding variable, but eliminating some factors with stepwise model selection still yielded no significant coefficients. The perconference FAR metric appears to be mostly independent of the factors we collected. The largest correlation we did observe, between FAR and the ratio of authors from the PC, is still nonsignificant and small. This correlation is unlikely to reveal a causal relationship, i.e., that inviting more women to the PC necessarily leads to increased FAR. As we have seen, there is no real correlation between the two, but since conferences generally exhibit higher FPR than FAR, it makes sense that conferences with higher PC participation in the authorship would also exhibit higher relative FAR.

Representation of women is partially associated with demographic factors
In addition to conference-related factors, we also analyzed the effects on FAR of three authorrelated factors: research experience, work sector, and country of affiliation.
Research experience. As a proxy metric for research experience, we collected the h-index [61] of each researcher with an identifiable GS profile and gender (4,700 unique authors and 2,034 unique PC members). As Fig 1 shows, female PC members exhibit a significantly lower mean and median h-index than males, but for authors, the differences across gender are not so large. Comparing authors' total past publication count as another proxy metric for experience also reveals nonsignificant differences in means, medians, 1 st , and 3 rd quartiles. The only significant gender difference shown in Fig 1 for authors is in the tail of the distribution, with men composing the majority of the top percentile (91.49%).  No woman in our dataset had an h-index above 94, but 19 men have, with a maximum of 141. This is only a minuscule percentage of the sample population (0.3%), so it is hard to draw any conclusions from this gender difference. It is nevertheless consistent with data in Table 3, where women in last author position (typically representing the senior member of the team), appear at a lower rate overall than women authors, and especially lower than lead authors (typically representing a junior member of the team). These findings agree with past observations that women continue to senior academic ranks at a lower rate than men [4,35,[62][63][64].
Work sector. Compared to experience, the gender gap across work sectors is more pronounced. Most unique authors in this dataset are affiliated with academic institutes (79.3%), followed by industry (14%) and government (6.7%). The respective FAR for each sector-11%, 8.5%, and 10.5%-show women to be significantly underrepresented in industry compared to academia (χ 2 = 4.8, df = 1, p = 0.03). Other studies have also found relatively fewer women engineers in industry research positions [36,62].
The distribution of work sectors among unique PC members appears similar, with 78.2% affiliated with academia, 14.1% with industry, and 7.7% with government. This similarity suggests that no sector is disproportionately favored in program committees. FPR values continue to be higher than FAR values, but notably, not by the same magnitude across sectors. For example, the FPR for academics (15.9%) is higher than their FAR by some 45%, but for industry and government, FPR values are higher than FAR values by 71% and 71%, respectively. Conceivably, conference chairs may be more intentional about balancing gender diversity in the two sectors that already show low representation. But it is unclear whether this actually hurts women's retention in the field, since the evaluation of job performance in industry may be less favorable for academic service tasks, so overburdening industry women without proper recognition could be hurting their future representation further.
Geographical factors. When it comes to geography, gender differences are much larger than experience or sector differences. Researchers in our dataset hail from 6663 different countries that show distinct differences in researcher count and female representation ( Table 6). Most of the top countries by author count appear to be more economically developed than the rest, perhaps because systems research can be capital-intensive, requiring state-of-the-art computing equipment. Female author ratio, however, does not show the same association with a country's economic development, as exemplified by the low FAR of the UK, Singapore, South Korea, Netherlands, and Japan. This result is consistent with larger gender studies as well [1,16,35]. Similarly, FAR does not appear to be strongly associated with a country's gender gap index [65][66][67].
FAR is also not strongly correlated with a country's number of authors (r = 0.2, p = 0.39). The correlation is even weaker if we omit the US, which comprises most authors (55.01%) and PC members (55.67%) for which we have country and gender information. US-based authors also exhibit higher FAR compared to the rest of the world (11.45% vs. 8.75%, χ 2 = 14.44, df = 1, p < 10 −3 ). About half of the total US-based CS researchers (and in our data) are likely foreignborn [7,28], but this distinction does not appear to explain differences in the gender gap [28,[68][69][70].
One hypothesis for the higher FAR in the US is that as the host of most systems conferences, the US might be more appealing to researchers who prefer domestic travel, such as parents of young children. In conferences in all countries except South Korea and Italy, we found a significantly higher representation of local-affiliated authors. However, we found no evidence of a gender difference in this preference-not in the US, where there are actually fewer women in US-hosted conferences-and not more generally, where the correlation between a country's FAR by affiliation and by hosted conference is nonexistent (r = −0.24, p = 0.53).
The number of authors affiliated with a country is highly correlated with the number of local PC members (r = 1, p < 10 −9 ), which also implies that most PC members hail from the West. Note, however, that Western reviewers are not significantly overrepresented compared to authors, as has been observed in journals in other fields [71].
For PC members, the gender-gap differences across countries are even higher than for authors, with women representing 20.53% of US-based PC members, compared to 14.14% in the rest of the world (χ 2 = 18.2, df = 1, p < 10 −4 ). Again, the fact that the US attracts many foreign scientists does not appear to explain the higher FPR in the US, since most of the foreignborn authors appear to be students [28], who are less likely to serve on PCs. With few exceptions, most countries exhibit significantly higher FPR than FAR, as in the overall statistics. Moreover, except for the US and Spain, all countries exhibit an even higher FPR for hosted conferences, unlike FAR. It is also worth noting that for researchers with unknown country affiliation, both FAR and FPR are very similar to the overall statistics, which suggests that any selection bias based on the availability of country and gender information is limited.

Linear model of gender
To round up our exploratory data analysis, we computed a logistic-regression mixed-effects model to surface the factors most strongly associated with gender. The model combines the 27 conference-related factors and 3 author factors (work sector, h-index, and the number of papers in this set) as predictor variables. Each data point comprises one author and accepted paper pair, with the author's gender as the outcome variable. All of the predictors were treated Table 6. Representation of women in the top 20 countries by author count. Shown for each country are: the number of conferences it hosted; total authors affiliated with the country; ratio of these authors that are women (FAR affiliated); ratio of female authors in local conferences (FAR hosted); total number of affiliated PC members, ratio of these that are women (FPR affiliated), and FPR in all locally hosted conferences. All counts include only persons whose email is unambigously affiliated with that country (with repeats). Women's ratios are compared to all other countries with a χ 2 test ( � p < 0.05; �� p < 0.01; ��� p < 0.001). as fixed effects, and each numeric predictor was scaled to the range 0-1. Because many of these factors may be correlated or confounded by conference, the model also included the conference name for each paper as a random effect. This model, like the one predicting FAR from conference factors alone, is not very predictive (AIC: 3188.6; BIC: 3365.1; theoretical conditional R 2 : 0.03). Most of the factors have negligible impact or significance on the author's gender. This null result reaffirms that the underrepresentation of women does not appear to stem from a particular conference, policy, or author demographic.

Country
The most significant predictive factor for an author being male turns out to be how many overall papers they have published in this set of conferences during 2017 (p = 0.01). This observation is not particularly insightful because the distribution of published papers skews heavily male on the right tail. In other words, since most of the prolific outliers were men, they produced an outsize effect on the linear model.
The ratio of papers with a PC member author in a conference is also linked with a higher likelihood of an author being female (p = 0.03). Since conference FPR values are higher than FAR values, it follows that more papers from the PC would be associated with more female authors. The only other factor with p <.05 is for conferences organized by USENIX, where men published at a slightly higher rate than other conferences, but this correlation is not likely to be causal.

Related work
A number of prior studies have analyzed the representation of women in various academic fields, including CS. Fewer studies have looked at specific fields of CS, and in particular, the large and influential field of computer systems. Here, we review recent studies and compare their data sources, metrics, methodologies, and findings to our own. We also briefly discuss some possible explanations of this gender gap that have been proposed in the literature for CS and as a whole, framing them in the context of computer systems.
One of the most expansive studies of gender representation in CS authorship was recently published by Wang et al. [2]. It examined Semantic Scholar authorship data from the 1940s to 2019 and looked at 151M publications, including 11.8M in CS alone. This study used the Gender API tool to infer genders from given names, omitting any rare or initialed names. Instead of assigning binary genders, however, the authors derived a gender probability distribution for each name from the accuracy estimates returned by Gender API. In the 2017 timeframe, FAR in overall CS was around 25%, significantly higher than FAR for systems alone.
A similarly large study looked at all CS submissions on arxiv as of 2016 [1]. For gender assignment, it also used a name-inference service (genderize.io), simply omitting all names where the predicted accuracy was less than 95%. It computed overall FAR as � 17%, and slightly higher for first authors, agreeing with our observation. It should be noted, however, that arxiv is a preprint server and these documents do not match exactly the peer-reviewed papers analyzed in most studies, including ours.
A more sophisticated gender inference approach was taken by Mattauch et al., which aimed for higher accuracy by using machine learning algorithms to also infer the cultural context of each name. Like with the other inference methods, gender could not be accurately inferred for Asian names, so over 20% of the author names were omitted in this study. Using this approach, the study estimated FAR for 18 CS conferences in the preceding six years, including six of our conferences: ASPLOS, EuroPar, EuroSys, SOSP, ATC, and VEE. For all but one of these conferences (VEE), the estimated FAR values were within 2% points of the ones we found, which suggests that these values have been fairly stable in recent years.
Another study exploring some of our conferences, but earlier in time ), was conducted by Cohoon et al. [6]. Generally, the FAR values they computed, even for the same conferences, tend to be higher than those we computed, with an overall CS number of � 25% by 2007. The discrepancy could be partially explained by the different periods under observation, although we doubt that a decade would lead to significantly decreased representation of women, based on the trends exhibited in the other studies. We do note, however, that Cohoon's study used a very different gender-assignment methodology, which could explain most of the difference. For 70% of papers, it used the same name-inference technique as the previous two studies using genderizer.io. For the others, it used a statistical approach that assigned a gender of female to authors with ambiguous genders with a probability of 40%-45%. Based on our experience with inferred and looked-up genders for both systems and nonsystems papers, we believe this probability tends to overestimate the actual ratio of women.
In contrast, Way et al. used a hand-curated dataset in their study of tenure-track faculty [8]. Their analysis used a list of 5032 tenure-track faculty from 205 CS academic institutes in the US and Canada and found only about 15% of CS faculty were women. Note, however, that the study was limited to North America and excluded students, which in our dataset comprised over one-third of the authors [28].
A good source of data on students in our timeframe comes from the Taulbee report [9], which found the ratio of women among fresh CS Ph.D. awardees in 2017 to be about 18%. Notably, in the discipline of computer engineering-which is perhaps closer in research topics to computer systems-the ratio was only about 11%.
Another complementary statistic also comes from the US-based National Science Board, which recently found women to represent just under 30% of the overall CS workforce [7]. This estimate is not limited to CS researchers, and in particular, authors, as in most of these studies.
Most of these sources point to a significantly worse gap in systems than the rest of CS. From the FAR statistics alone it is not immediately clear why this should be the case, but we can look at some of the expansive literature on the gender gap for clues. Many causes for women's underrepresentation in science and technology have been posited, and we briefly describe a few of these next, in the specific context of our data for systems.
One important factor that was associated with gender differences in publication rate and citations was the possible role of resource requirements [72]. Many of the subfields of computer systems, such as high-performance computing, do indeed require expensive experimental platforms, which may partially explain their gender gap [48,63]. But high resource requirements cannot fully explain lower FAR metrics, as evident in the data on CS theory conferences we collected. The lack of association between a country's FAR and its economic development also weakens this explanation for systems as a whole. High resource requirement has also been associated with a gender gap in productivity [73]. Although we found no significant differences in productivity across genders for systems authors (as measured by h-index), the high resource requirements of some systems subfields could explain some of the larger gender gap we found in productivity for PC members, or in the long tail of the author distribution. An interesting open question is whether there are productivity differences across genders for authors in other CS fields with lower resource requirements.
An important source of women's recruitment and retention in a field is the availability of female role models [74][75][76]. The relative dearth of women in last author position that we observed in systems conferences may therefore have a contributing factor to lower FAR as well. Recall that our collection of systems papers averages 4.45 coauthors per paper, which is some 50% higher than the mean �3.0 authors per paper that Wang et al. found in contemporary CS publications [2]. We hypothesize that this difference stems from the large emphasis on systems implementation in this field, requiring larger team efforts.
The difference in collaboration may also offer clues to the larger gender gap in computer systems. Some past studies found that women's collaborative research networks were smaller than men's [62,77]. The overall lack of female peers and mentors in systems can make collaboration even harder for women [78], leading to fewer or smaller collaborations, which would consequently lower their research output in systems.
Finally, we must take into account that different fields attract or retain women at different rates. For example, a number of studies posited that women are more likely to work in human-centered fields [79][80][81]. The higher FARs we observed in human-computer interaction and CS education appear to confirm this observation for CS fields. Systems in particular is perhaps most related to the field of electrical engineering. This field has also historically fared poorly in terms of women's underrepresentation, and exhibits FAR values hovering on 10%, similar to the one we observed for systems [62,82].
Another factor in the choice of fields is pay and prestige. For example, it is well known that higher-paying occupations still average higher ratios of men, both because of employers' preferences for men in these occupations and their devaluation of women's work in other occupations [83,84]. The large economic impact of systems research on the technology sector-and subsequently its influence on workers' pay-could also explain some of the gender gap we observed. Even within well-paid occupations, there are gender gaps that can be partially explained by the prestige and gendered social expectations of each subfield. For example, despite the increase in the number of female doctors overall, relatively few women still practice surgery, especially complex surgery [85].
Women are also underrepresented in fields where success is believed to require brilliance [86], such as pure mathematics, or in our dataset, theoretical computer science and algorithms. This effect may be purely one of perception and prestige, and not necessarily grounded in statistical observations. Nevertheless, in a field such as CS education, which society may not perceive as particularly brilliant or prestigious, we find a higher representation of women in our data.
A thorough analysis of the factors that contribute to the larger gender gap in computer systems research is outside the scope of this paper, which focuses on quantifying and isolating this specific gap. But the cursory exploration presented in this section suggests that such an analysis needs to account for the multifarious social, economic, and historical factors that affect the gender gap. Many of these systemic factors have been investigated in the larger context of the gender-gap in CS and the sciences in general [4,63,73,[87][88][89][90]. Several of these works also make concrete recommendations for closing the gender gap [79,91].

Conclusion
This study presents a methodology and dataset to estimate the current percentage of women in systems research. Unlike most comparable studies that use gender-inference based on names with limited accuracy and coverage, our hand-curated dataset includes genders for nearly all the researchers participating in these conferences, leading to more precise estimates.
Our main finding is that only � 10% of systems authors are women, a ratio that is significantly lower than the � 16% we found for non-systems fields. The percentage of women who serve on PCs is almost twice as high, but the evidence suggests that it is relatively inflated, and not representative of systems as a whole.
The large gender gap is not associated with almost any of the explanatory factors evaluated. Importantly, variations in female author ratio cannot be explained by multiple conference factors, including policies that are explicitly designed to improve diversity. These variations are also not fully explained by demographic differences such as research experience or work sector. The data show larger gender-gap variations by country of affiliation, but these appear unrelated to geographical region, economic development, or gender gap index. The lack of significant correlations or strongly predictive factors in the linear models suggests that the low representation of women in computer systems is endemic to the field, rather than an effect of conference factors or author demographics.
Inviting more women to visible conference roles and implementing diversity-focused policies likely contributes to more inclusive conferences, but is insufficient on its own to add women authors to the field. Increasing women's participation in systems research will require addressing the systemic causes of their exclusion, which are even more pronounced in this field than in the rest of CS. The underrepresentation of women in the field may be related to factors such as high resource requirements, fewer female role models and collaboration opportunities, and different gender preferences. But these factors alone do not completely explain this complex, multifaceted phenomenon. Identifying the specific, endemic causes for this larger gender gap remains an open research question, which we plan to address in a future publication.
Supporting information S1 Dataset. Complete data and source code. (ZIP) S1 Appendix. Detailed conference list. (PDF)