Trends of research productivity across author gender and research fields: A multidisciplinary and multi-country observational study

Bibliographic properties of more than 75 million scholarly articles, are examined and trends in overall research productivity are analysed as a function of research field (over the period of 1970–2020) and author gender (over the period of 2006–2020). Potential disruptive effects of the Covid-19 pandemic are also investigated. Over the last decade (2010–2020), the annual number of publications have invariably increased every year with the largest relative increase in a single year happening in 2019 (more than 6% relative growth). But this momentum was interrupted in 2020. Trends show that Environmental Sciences and Engineering Environmental have been the fastest growing research fields. The disruption in patterns of scholarly publication due to the Covid-19 pandemic was unevenly distributed across fields, with Computer Science, Engineering and Social Science enduring the most notable declines. The overall trends of male and female productivity indicate that, in terms of absolute number of publications, the gender gap does not seem to be closing in any country. The trends in absolute gap between male and female authors is either parallel (e.g., Canada, Australia, England, USA) or widening (e.g., majority of countries, particularly Middle Eastern countries). In terms of the ratio of female to male productivity, however, the gap is narrowing almost invariably, though at markedly different rates across countries. While some countries are nearing a ratio of .7 and are well on track for a 0.9 female to male productivity ratio, our estimates show that certain countries (particularly across the Middle East) will not reach such targets within the next 100 years. Without interventional policies, a significant gap will continue to exist in such countries. The decrease or increase in research productivity during the first year of the pandemic, in contrast to trends established before 2020, was generally parallel for male and female authors. There has been no substantial gender difference in the disruption due to the pandemic. However, opposite trends were found in a few cases. It was observed that, in some countries (e.g., The Netherlands, The United States and Germany), male productivity has been more negatively affected by the pandemic. Overall, female research productivity seems to have been more resilient to the disruptive effect of Covid-19 pandemic, although the momentum of female researchers has been negatively affected in a comparable manner to that of males.


Introduction
The growth of global scholarly research activity and its dynamics have for years been a major focus of fields such as science of science and information science [1][2][3][4]. Previous studies have looked at the expansion of scholarly literature and have attempted to infer current patterns and predict future directions [5,6]. Such information is essential for informing scientific communities and policy makers of how the landscape of research around the world is changing. Such information can help governments allocate research funding, research institutions guide hiring and promotion decisions, and individual researchers maximise the impact of their work.
While progression of scholarly research can be measured through a variety of indicators [7], a primary indicator of the growth of research has traditionally been the number of scholarly publications [2,4,8]. Despite the imperfectness of this metric for measuring research productivity-both at the level of individual scientists [9][10][11] and aggregate levels [12][13][14]-it has remained the primary metric for quantifying research production [8]. As Riviera [15] has put it "Communication in science is realized through publications. Thus, scientific explanations, and in general scientific knowledge, are contained in written documents constituting scientific literature" (p. 1446).
It has been established that the growth of science does not take place at an equal rate across fields [2]. The phenomenon of "field variation" or "field-dependence" [16,17] has been document in several studies [18]. One component of our analyses is dedicated to identifying differential patterns of growth in different fields of science while also detecting potential disruptive impacts of Covid-19 pandemic on such patterns. Studies that have investigated or predicted the potential disruptive effects of Covid-19 pandemic on research productivity have mostly produced signs of negative impact [19][20][21]. These are predominantly based on the analysis of pre-prints of publications (or working papers). Negative impact has been documented in relation to the knowledge production of social scientists [22] and economists [23]. However, some early analyses predicted an increase in productivity in certain fields such as economics and finance [24]. It has also been argued that the pandemic will affect research fields and groups of researchers differently. Termini and Traver [19], for example, point out that "In response to the pandemic, research institutions have enacted strict changes to permitted research operations, requiring scientists to abide by social distancing guidelines in the laboratory, facility closures, and ramped down laboratory activities. While scientists at all stages in their careers have been impacted by these changes to the research environment, early career scientists such as postdoctoral fellows and junior faculty are particularly vulnerable during these unconventional times" (p1). Of particular interest in the present study is to determine which research fields experienced the largest impact of the pandemic disruption, based on publication output.
This study is also focused on determining potential discrepancies in the disruption of productivity trends across male and female scientists. The issue of gender representation in scientific publishing is perhaps one of the most documented and most heavily discussed aspects in the science of science [25][26][27]. Several case studies have documented the gender inequality problem across different countries such as Italy [28], Canada [29], Sweden [30], and Norway [31], to name a few. The origins of the gender gap problem have been much debated in the literature and several explanations have been offered by previous studies [32][33][34][35][36]. Clearly, when considering the aggregate representation of male and female researchers in publications, one main factor that explains the gap is under-representation of female scientists in academic positions [37]. Other factors have also been investigated, most important of which being motherhood and childcare responsibilities [38][39][40][41], and the disparities in the extent of opportunities for collaborations across male and female scientists [27,38,42].
In view of these contributors to the gender gap in research, how might the Covid-19 pandemic have had a differential impact on female and male researchers? The dynamics of family responsibilities and childcare have undergone considerable changes in 2020 [43][44][45][46][47][48] as a result of the work-from-home arrangements and/or lack of access to external childcare services in many countries. This raises the question of whether this factor has played a role in overall (aggregate) productivity of female scientists. Covid-related loss of access to facilities, mentorship and in-person meetings with peers may have exacerbated the gender gap in research productivity by reducing opportunities for collaboration. Conversely, as online meetings and webinars became more prevalent since the onset of the pandemic, more opportunities may have arisen for collaborations, especially at an international level. Although there is insufficient research evidence to date, some writers have argued that the pandemic's negative impacts will be fall disproportionately on women. It has been argued that "The coronavirus disease 2019 (COVID-19) pandemic has upended almost every facet of academia (1). Almost overnight the system faced a sudden transition to remote teaching and learning, changes in grading systems, and the loss of access to research resources. Additionally, shifts in household labour, childcare, eldercare, and physical confinement have increased students' and faculty's mental health needs and reduced the time available to perform academic work. . .Many women academics will likely bear a greater burden during the coronavirus disease 2019 (COVID-19) pandemic" (p. 27) [49]. Similarly the findings of a preliminary survey of American and European scientists in April 2020 predicted that "female scientists and those with young dependents were to be affected disproportionately" [50]. Nearly two years since the onset of the pandemic, now there is an opportunity to objectively quantify these effects.
This study aims to examine trends in overall research productivity as well as the possible differential productivity across research fields (1970-2020) and author genders (2006-2020). While the trends in representation of male and female scholarly publication are revisited across different cultures (2006-2020), the potential disruptive impacts of Covid-19 are also quantified based on research production of male and female scientists during 2020, as compared with their respective trends. The data also provides insight into the trends in the gap between male and female representation in scholarly publications and the expected time frame for the gap to close in different world regions should existing trends continue.

Methods and data collection
Three sets of publication meta data were collected from WoS, each using a combination of search query strings. The query strings are all formulated for the "Advanced Search" engine of the WoS Core Collection. All analyses are based on meta data of publication counts and they reflect "all document types".

Research fields
The first set of meta data contains records of publication counts of the top 100 WoS Categories, in terms of the volume of publications attributed to each category (out of the 254 categories recognised by the WoS). The list has been obtained by acquiring all publication records since 2000 and listing the WoS Categories in descending order based on the number of publications counts. Subsequently, 100 different search queries were formulated and entered into WoS Advanced Search. Each query reads as "WC = [the name of the category as specified by the WoS]", where WC is the Field Tag for WoS Category. From the outcome of each query, the meta data of publication count (of all document types), during Jan 1970 till Dec 2020, were exported and stored. This time period was applied to all disciplines. While the data of each category is analysed separately, for the presentation purposes, these categories were also further categorised into "broad discipline areas". These areas include (in alphabetical order) "Agricultural Sciences", "Arts & Humanities, Interdisciplinary", "Biology & Biochemistry", "Chemistry", "Clinical Medicine", "Computer Science", "Economics & Business", "Engineering", "Environment/Ecology", "Geosciences", "History & Archaeology", "Literature & Language", "Materials Science", "Mathematics", "Multidisciplinary", "Philosophy & Religion", "Physics", "Plant & Animal Science", "Psychiatry/Psychology", "Social Sciences, General" and "Visual & Performing Arts". Only broad areas that have at least one category in the list of top 100 get a mention in our analysis. Also, some of the WoS Categories have been attributed to more than one broad category. In such cases and for presentation purposes, we allocated the category randomly to one broad discipline.

Author gender
Two further sets of meta data were exported for the gender analyses. Determination of the author gender has traditionally been made based on the first name of the authors. This, however, is typically done on a pre-sampled set of publications, whereas here, we sought to establish a reverse approach where a search query is used to generate samples of papers with male and female authors. In doing so, we formulated our queries based on the Author(s) search function of the WoS. According to the WoS guidelines for searching names of authors, there is no way to only search for the first name of authors. The acceptable format is AU = ["surname" SPACE "first name"] (where AU is the Field Tag), and if only one entry is specified within the search, then that is regarded as the surname by the search engine. So, the question would be; is there any way that one can generate all publications with authors of a specific first name (in any position of authorship, first, middle or last, noting that the WoS search engine does not differentiate between authorship positions). It was discovered that this can be done through the asterisk ( � ) wildcard. Consider the query AU = A � Albert. Such query would return any publication on which an author with first name of "Albert" and a surname with the initial "A" is listed. When the query is extended to AU = (A � Albert OR B � Albert OR C � Albert OR . . . OR Z � Albert) (where OR is a Boolean operator), it generates all documents on which at least one Albert is listed as an author. This method constitutes the essence of our proposed search strategy.
Assume that one develops a query string by including all common first male/female names of a certain language, say German. Then such query would (approximately) generate all publications on which at least one author with a German male/female first name is listed. Another consideration, however, is the limit for the number of search terms imposed by the WoS. A WoS advanced search query cannot contain more than 16,000 Boolean operators, and if a full list of male/female names of a certain language is to be repeated 26 times, then the list cannot contain a large number of names. In consideration of such limitation and also given the fact that for several languages that we sampled from there are several hundreds of male/female names, we adopted two different sampling methods. One is called here the "long list of first names, A-C surname initials" (hereafter, the A-C method) and the other is called "short list of first names, A-Z surname initials" (hereafter, the A-Z method). In the A-C method, for each given language, we composed a long (as comprehensive as possible) list of male first names and a similar list of female first names, but only considered the first three letter of the alphabet for the surname initials (so that the list of first names only has to be repeated 3 times). In the latter, however, we composed a list of 30 most popular names of each given language, one for females and one for males, but repeated that list 26 times to include every letter of the alphabet for the surname initials.
Popular first names given to people born in the late 1950's to 1990's were obtained as the sample for this study to ensure that it was representative of scholars active in academia from 2000-2020 (given the average age of an academic is around 40 years). Government registries and records (i.e., registry of Births, Deaths and Marriages) were used to obtain the names, however, for certain countries (i.e., Iran, India) this information was not publicly available. Instead, a minimum of two non-government databases were used to compile the list of top 30 first names, and it was essential criteria that they had separate lists for respective decades (1950's-1990's). In countries where a common language is spoken (for example UK/Australia/ USA, Brazil/Portugal), the list of top 30 names was compiled using a combination of the most common names in those countries. Unisex names were excluded from the lists, and if a first name had alternative spellings (i.e., addition of an accent), the name and its alternatives were counted as one. To ensure further accuracy, two and three letter names were excluded, as they were often shared across multiple languages. For example, 'Jan' is a common European male name, however, in English it is a female name, and is also an abbreviation of the word January. All lists of names are accessible in the Online Supplementary Material of the paper.
In total, 14 different languages were considered for both A-C and A-Z methods. This includes (in alphabetical order) Arabic, Dutch, English, French, German, Hindi, Italian, Japanese, Korean, Persian, Portuguese, Russian, Spanish and Turkish. These languages were selected based on two criteria: (1) they are amongst the most widely spoken languages around the world, and (2) the languages allow for the female and male names to be easily distinguished. In order to increase specificity of the resultant data, for each method (A-C or A-Z) and each language, the search query was combined with the name of a country where that language is predominantly spoken (e.g., CU = Germany AND AU = (A � Albert OR B � Albert OR C � Albert . . .)) (where CU is the Field Tag for countries and AND is a Boolean operator). Only countries that are included in the list of top 100 in terms of the quantity of their scholarly publications were considered. This resulted in 37 different combinations of country and language. For each query (i.e., associated with the male/female A-C/A-Z list of each country-language combination), the meta data of publication count (for all document types) for the period of 2006-2020 was exported and stored for analysis. The reason for going back to only 2006 was that for some country-language combinations, the data before 2006 was scarce and/or lacking discernible patterns. Therefore, for the sake of consistency in comparisons, only the counts of publications from 2006 onwards were considered for all country-language combinations (i.e., a 15 year-long history of their publication records). It should also be noted that the resulting counts are based on number of publications with at least one female author or with at least one male author. Therefore, the counts do not take into account fractional contributions to papers and so cannot be inferred to directly reflect proportional contribution to publications.
Most previous studies on the gender gap in authorship are based on determination of the gender in individual documents of a sample of publications. Such an approach, while more nuanced than that of ours, often places a constraint on the number of documents that can be analysed. Typically, studies that use the abovementioned method rely on a sample of a several thousands of articles/pre-prints [43,51]. Contrastingly, our method is query based, a novel approach in this research area. It is more suited for providing a bigger picture and broadening the scope and scale of the investigation, but that comes at a cost. The control over certain details such as position of authorship, or proportion of male versus female authors on each document is compromised. The series of formulated queries can be used to directly generate the set of publications in which male/female scientists are listed as authors. This directly examines the records of more than 75 scholarly publications without the pre-requisite of exporting their bibliographic data. It also does not rely on the use of any application programming interfaces. The method works only with the meta-data of publications and exempts the analyst from collecting a sample of publications. As such, it can be applied to the entire record of the WoS, without the need to export those records. The method can therefore be used as a benchmark for tracking overall productivity of male and female academics and their relative degree of representation in published documents. Our query-based approach also provides a foundation for replicating the data of overall gender productivity and observing changes in the trends over time. Additionally, it can be further modified to investigate author gender patterns within specific scholarly disciplines.

Method for analysing the growth and gap
Consistent metrics were produced based on each of the abovementioned sets of meta data of publication count. Each metric is used to either quantify trends of productivity over the last 50 years (or 15 years, for the gender analyses), or to focus particularly on 2020 and the impact on the pandemic on such trends. For each dataset, the Actual Growth (AG) of publication counts were calculated associated with each year t, as Eq 1, where Pub Cnt(t) signifies publication count in year t.
The averages of this quantity (AV-G) over the last five, 10 and 15 years (for genders) and five, 10, 20 and 50 years (for disciplines) are reported for each entity of analysis. This quantity is also specifically reported for the year 2020. From here on, when we mention AG without specifying a year, we refer to the 2020 version of this quantity. This is in contrast to the Deviation of Growth (DG) (in 2020) from the projected value. This quantity is calculated as below. Firstly, the data point of 2020 is excluded from the set of publication count. Then based on the historical data of 1970 (or, 2006)-2019 (depending on whether analysis of disciplines or genders is concerned), a polynomial curve of degree four is fitted to the data (this fitted curve has been visualised for each data set in the Results section). Using this regression analysis, we quantify the publications counts that was expected to manifest in 2020 (based on the record of 15 or 50 years of trend in publication count). This quantity is referred to as predicted counts of publications in 2020, pred 2020 (as opposed to actual number of documents in 2020, act 2020 ). The DG quantity is then calculated as Eq 2.
The above are the metrics that are common between the disciplines and gender analyses. Additional quantities, however, we calculated in relation to gender analyses. For each set of data of country-language combination, the Gap (G) and Relative Gap (RG) between male and female publications as well as the Ratio (R) of female to male publications were calculated based on Eqs 3-5 (note that by male/female publications we mean publications on which at least one male/female author has been listed, at any authorship position). In this notation male Cnt and female Cnt respectively represent counts of male and female publications for the country-language combination of interest. Using the observation of ratios from 2006 to 2020, we also conducted an additional regression analysis on each set of data and predicted the year in which the female to male ratio of publication counts reaches .3, .5, .7 and .9 for each countrylanguage combination. When no reasonable number could be achieved as a solution to the respective equation (e.g., more than 200 years in the future), then a dash sign "-" is reported in the respective tables of outcomes.

Analyses across research fields
Publication counts across all top 100 WoS categories for the years 1970-2020 are presented in  provides various statistics related to WoS, including the actual and predicted number of publications in 2020, AG and DG, as well as average growth over the last 5, 10, 20 and 50 years. The cells containing statistics of actual and deviated growth in 2020 have been colour-coded to facilitate the comparison, with red indicating a decrease and green indicating an increase. According to WoS records, the biggest categories of contemporary research are Biology, Engineering Electrical Electronic, Biochemistry Molecular Biology, Chemistry Multidisciplinary, Medicine General Internal, and Material Science Multidisciplinary. The fastest growing areas over the past 5 years are Environmental Sciences and Engineering Environmental. The average annual growth of publication counts in these two categories over that    Within the various domains of science, this study uses the deviation from trends during 2020 as a proxy for the effect of the Covid-19 pandemic on research production. With this in mind, one can observe a mix of growth and decline in productivity across the broader areas. However, two areas appear to have been particularly affected are Computer Science and Social

Analyses of author genders
The temporal record of publication counts of male-and female-authored publications (i.e., publications on which at least a male/female author has been listed) across language-country combinations have been visualised in  Tables 2 and 3 summarise the results of the analyses based on the A-C and A-Z methods, respectively. In these tables, the actual and projected number of publications of male and females in 2020, along with the actual growth (AG) and deviation from the predicted growth (DG), have been reported, along with the average of the (actual) growth over the last 5, 10 and 15 years. The last four columns of the tables report the year in which the ratio of female to male publication counts of the language-country combination is expected to reach values of .3, .5, .7 and .9 based on the current trends. When no feasible solution could be found for the equation, a dash sign "-" has been symbolically reported. We also considered estimating number of years for reaching absolute parity i.e., r = 1.0. However, for many of the countries, no reasonable number can be found as a solution to report. While the ratio for all countries is "asymptotically approaching 1.0", a solid parity (r = 1.0) cannot be achieved for any country that is showing a divergence pattern based on absolute numbers. That would require the absolute gap to close too, and most of the countries will not have that no matter how infinitesimally close their ratio gets to 1.0. The entities of Tables 2 and 3 in the AG and DG columns have been colour-coded to better demonstrate the positive (green) and negative (red) growth during 2020.
Data collected for all language-country combinations (from here on, "cultures" for simplicity) obtained from both methods confirm the existence of a notable gender gap between the scholarly publications of male and female scientists. The gap is observable across all 37 examined countries. Moreover, and perhaps most strikingly, we do not observe any trend that is indicative of the gap narrowing (in terms of absolute total number of publications) in any culture. However, pattern of temporal variations of this gap is distinctly different across cultures. According to the trends presented in Figs 9, 10, 15 and 16 (as well as Figs 11, 12, 17 and 18), three general patterns are differentiable across cultures. These patterns are discussed in the following paragraphs.
The first pattern is a set of cultures in which the gap between total publications of male and female researchers has been exponentially widening over time. This includes almost all Arabic speaking countries (Egypt, Iraq, Jordan, Kuwait, Lebanon, Oman, Qatar, Saudi Arabia, and United Arab Emirates). An exception is Tunisia, whose gap has shown signs of flattening out (or at least slowing down) over the last three to four years (although there is still no discernible sign of narrowing down in a consistent fashion). These are the countries for which the curve of absolute gap (Figs 11, 12, 17 and 18) has a shape of an increasing convex curve, meaning

PLOS ONE
Rends of research productivity across author gender and research fields that, not only has the gap been widening every year in these cultures, but also that the gradient of this increase in the gap has been increasing. If the existing trends of these cultures continue, the gender gap will worsen every year, and at an increasing rate. Note that for some of these

PLOS ONE
Rends of research productivity across author gender and research fields Table 2

AV-G 5y % AV-G 10y % AV-G 15y %
Year ratio = .3 Year ratio = .5 Year ratio = .7 Year ratio = .9 Arabic countries, when we look at the ratios of female to male total productivity (Figs 13, 14, 19 and 20), an extremely slow pattern of increase in the ratio is observed. In these cases, the gender gap is very gradually closing but the discrepancy in number of male-and female-authored Year ratio = .5 Year ratio = .7 Year ratio = .9 Hindi  publications is increasing due to the rapid overall growth in publications. Take the case of Arabic-Saudi Arabia as an example. When considering the temporal trend in the female to male ratio within this culture, one cannot predict the year when the total research productivity of their female scholars becomes 70% of that of their male scholars (see Tables 2 and 3), even a hundred years into the future. Also, to reach the 50% ratio, the optimistic forecast is more than 60 years from now, while the pessimistic forecast suggests 120 years from now. The second pattern concerns cultures in which disparity between male and female total productivity is also increasing (similar to that of the countries listed above), except, not at an increasing rate (i.e., exponential way), and rather, at an approximately linear rate. This includes several European countries (e.g., Belgium, Netherlands, Spain, Germany) whose curves of absolute gap have even started to become slightly concave shape, with slight signs of decrease or flattening out in recent years. Less clear examples of this pattern are seen in Italy, Japan, Iran, India and Brazil. These countries are not fast-tracking the closing of the gender gap, but the problem for them does not seem to be exponentially escalating either. As a result, their relative gap has been consistently decreasing and their female to male publication ratios show a more discernible upward trend. For a considerable portion of this cohort of countries, both A-C and A-Z methods predict that, by the middle of the current century they reach a ratio .7 in terms of their female-to male-authored publication (e.g., Italy, Netherlands, Iran, Spain, Brazil). A larger portion of these countries are predicted to reach .5 ratio by the said date.
The third pattern concerns cultures that have managed to maintain a relatively constant absolute gap between total female and male productivity for a sustained amount of time and even have shown small signs of narrowing the gap in the very recent years. This pattern is almost exclusively observable in relation to the developed countries. It is noticeable for Australia, Canada, England, New Zealand, and USA. Both methods (A-C and A-Z) suggest that in all of these countries the absolute gap has had a decrease in 2020 compared to 2019, indicating that this could mark the beginning of a downward trend in gender gap for these countries. However, it is too early to make clear predictions, as no sustained downward trend has been observed in relation to any country yet. According to both sampling methods, the current trends in female to male publication ratios of these countries indicate that by the middle the current century all of them will have reached a ratio of .9.
Focusing on the productivity during 2020 in contrast to the previous trends, striking patterns are observable in relation to male and female productivity. Firstly, both methods suggest that academics of Arabic countries (both genders) have shown the highest degree of actual growth in 2020, compared to their 2019 record of publications, whereas English speaking and some Western European countries have shown the opposite trend. Both male and female academics of Asian countries such as Iran and India have demonstrated positive growth in terms of total productivity, although this is to a lesser degree compared to Arabic countries. This pattern is also observable in relation to South American counties such as Chile, Argentina, Colombia and Mexico.
For most Arabic speaking countries, both sampling methods suggest that female productivity has had a larger growth compared to male total productivity. This includes countries such as Saudi Arabia, Egypt, Kuwait, Lebanon, and Jordan. The only Arabic speaking country for which the growth of total productivity for male academics has been larger than females in 2020 (compared to 2019) is United Arab Emirates. Similarly, when considering the deviation from the projected growth, both methods suggest that female academics of Arabic speaking countries have shown lesser negative deviation from their productivity trend (i.e., their projected productivity) compared to their male counterparts.
When considering English speaking and European countries that have experienced negative impacts on their academic productivity during 2020 compared to previous year(s), the pattern is slightly different. For most of these countries (e.g., Netherlands, USA, England, Canada, Australia, New Zealand) both sampling methods suggest that in terms of actual growth in 2020, female total productivity has been more resistant to the disruptive effects of the pandemic, compared to that of their male counterparts. In certain countries such as USA, Canada and Germany, male total productivity is found to have decreased in 2020 compared to 2019, whereas female total productivity has had positive AG in 2020, and this is confirmed by both sampling methods. However, when the deviation from projected growth is considered, this pattern becomes more mixed. This suggests that that while productivity of female academics seem to have been more resistant to disruptive effects the pandemic compared to male counterparts, their momentum has been rather noticeably affected in a comparable manner to that of males in those countries.
Italy shows a growth in publications in 2020, and both sampling methods suggest that the growth has been larger in terms of female productivity. A similar pattern is also observable for Iran, Spain, Chile, Ukraine, Russia, and Brazil. The opposite pattern-i.e., male total productivity showing more resistance to the pandemic effects-is not a common observation in our data (at least when we expect the pattern to have been confirmed by both sampling methods). An exception is India. The productivity of male and female Indian scientists have both increased in 2020, but the amount of increase is estimated to have been larger for male than female, according to both sampling methods.

Discussion
The findings presented above offer a richly detailed picture of trends in research productivity over the last half-century, as these vary by research field and by author gender. They also allow some tentative inferences about the impact of the Covid-19 pandemic on research productivity based on departures in 2020 from projected trends.

Trends by research field
The general trends in annual publication numbers are consistently rising across all research fields, a conclusion that is unsurprising given the well documented relentless increase in global research productivity in recent decades. In most fields the rise since 1970 is approximately linear or exponential, with variation in the rate of change from gradual to very steep. Rates tend to be steepest in the more technological fields, such as environmental sciences and engineering disciplines, nanoscience, and geosciences, and also in some Clinical Medicine categories (e.g., Oncology and Health Care Sciences Services). Rises tend to be less steep in the social sciences and humanities disciplines. Nonlinear patterns, such as periods of stasis preceding rapid increases or recent plateaus, are also found in particular cases, and may be interpretable in light of local dynamics in these fields. One challenge facing the interpretations of all of these patterns is the degree to which the extent and timing of publication growth reflects endogenous change in the research fields themselves or changes in the publications indexed by WoS. Nevertheless, the clear pattern of rising global productivity is unambiguous.

Trends by author gender
The pattern of changes in productivity as a function of author gender, evaluated over a period of 15 rather than 50 years, reflects a similar combination of broad trends and local (countryrather than field-level) variation. The key broad trend is a gradual increase in the proportion of publications with at least one female author, with the ratio of these publications to those with at least one male author increasing from less than 0.4 to more than 0.5 over the study period. The rate of change is troublingly slow, however, with parity very distant and even a 0.7 ratio being forecast as two decades away. In the context of rapid increases in rates of publication, the disparity between the absolute numbers of female-and male-authored publications is growing globally rather than shrinking. Our findings demonstrate the magnitude of the research publication gender gap and the slow but steady rate at which it is narrowing.

Covid-19 impacts by research field
Inspection of departures of 2020 publication counts from 2019 figures and from long-term polynomial forecasts reveals a general pattern of negative growth that can be cautiously ascribed to the Covid-19 pandemic. However, where research fields are concerned the pattern is relatively weak, with substantial variability across fields. Of the 94 fields examined, a slender majority (51.1%) experienced a reduction in publications in 2020 relative to 2019, with a larger majority (59.6%) falling below forecast. However, many fields did not follow this pattern, some such as the study of infectious diseases and public health showing strong growth for reasons likely to be directly pandemic-related, and others such as environmental science probably as a result of growing attention to climate change issues.
In contrast, the humanities and social sciences (e.g., philosophy, religion, history, political science, education research), were especially hard hit in 2020, showing substantial negative growth in publication counts: on average -12.0% relative to 2019 and an astonishing -37.2% compared to forecast. Negative growth was also common in engineering and computer science fields. The reasons for these negative impacts are not obvious. One possibility is that fields whose primary publication forum is conference proceedings papers, such as computer science, will show a reduction in publications when travel is restricted, and many conferences are suspended. Another is that the traditionally slower peer review processes in the social science and humanities are more impacted by pandemic-related disruptions, resulting in publication delays. These possibilities should be considered in future research, informed by field-specific knowledge. Further studies should also examine whether reduced publication counts in 2020 primarily represent delayed publications that will appear in future years, or whether it represents research that was not conducted due to the pandemic. Whether or not publication counts recover in 2021 and beyond may help to answer this question.

Covid-19 impacts by author gender
The potentially disproportionate impact of the Covid-19 pandemic on female researchers has been a focus of speculation and some preliminary research, but the present study offers the broadest scope of analysis conducted to date. It suggests that there has been no substantial gender difference in the disruption due to the pandemic. Across 37 national cases, each representing one language-country combination, the average growth in number of female-authored publications from 2019 to 2020 was approximately double the average growth in maleauthored publications. Relative to forecast, however, the respective levels of growth are negative to similar degrees. By the A-C method of gender determination, female-authored publications revealed marginally more negative growth than male-authored publications (-1. . By implication, the Covid-19 pandemic may not have disproportionately disrupted the research productivity of female researchers as has been feared, at least insofar as 2020 publication outputs are concerned. It is possible that disproportionate impacts might emerge at a greater lag, or that they are specific to researchers in particular age or other demographic groups. Future studies should investigate these possibilities.

Limitations
These gender-related findings must also be considered tentative for two key reasons. First, our method for counting publications as female-and male-authored does not allow direct comparison of numbers of publications because publications with mixed-gender authorship are counted toward both categories. Our method also does not directly represent the relative magnitude of research contributions by female and male researchers because it does not count their fractional contribution to publications but only whether they are categorically present or absent. Second, our novel method for classifying author gender may be imprecise and requires further validation. Encouragingly, two different implementations of the method (A-C and A-Z) generated very similar patterns of findings. It is also important to recognize that although the method may have some error rate, there is no obvious reason to believe it would be systematically biased to allocate author to one gender or another, and it has the significant advantage of identifying author gender at much greater scale than other methods.

Directions for future research
A potential dimension that was not explored in this work is the investigation of gender gaps within specific research fields. Such investigation can potentially be undertaken using our proposed query-based method. Also, we note that a more representative metric for the investigation of gender productivity gap could perhaps be the frequency of first authorships (i.e., counts of male versus female first-authored publications). Not differentiating between ranks of authorships, our results with respect to the effect of Covid-19 pandemic are in contrast with another study [43], for example, who found 19% reduction in 2020 in female first-authorship representation in a sample of medical journals compared to 2019. The current paper considers any publication with mixed gender authorship toward both gender groups. The findings with respect to the effect of the pandemic on genders may as a result show some contrasts to the studies that considered pre-prints [22], those that considered first/last/corresponding authorship [51,52], or the proportion of representation of male and female authors based on a sample of articles/pre-prints. This further adds to the mixture of evidence that already existed, particularly on the effect of pandemic on male and female productivity (see [53] and [54] for examples) and calls for more nuanced investigations of this problem.
Currently, major scholarly reference databases (e.g., WoS or Scopus) do not offer search options that can differentiate between authorship orders. The detection of author names in such search engines is purely based on a binary determination of whether a name exists within the list of authors, regardless of the position of the author in the list. This places a limitation for the application of the query-based method proposed by this study. However, should such development be implemented by the WoS, then the same query-based method can be readily employed to examine gender disparity based on patterns of first authorships. Given the benefits that such differentiation could bring to academic inquiries of this nature, we recommend that WoS offers this possibility to its users.
A complementary dimension to the analyses presented here is to explore whether countryspecific lockdown measures explain the decline in research productivity (similar to the approach of Hipp and Konrad [44] in the context of impact on professional advancement).
It is also important to note that our observations regarding the gender production gap remains limited to overall/gross productivity of male and female scholars and not productivity per individual male or female researchers. The observed gaps are, as such, partly a reflection of more male researcher presence in academia [55,56]. While the results do speak to an existing and widening gender gap in many geographical regions, they do not have any bearing on whether male scholars on average have been more (or less) productive than their female counterparts. Matters of individual author productivity [46] were beyond the scope of our work.
Also, further research could investigate how the pandemic impacted scholarly productivity across various career stages. However, we do not see a feasible pathway conducting such investigation using query-based methods, as proposed in this work. Traditional methods of sampling published articles or pre-prints as well as self-reported questionnaires could be the pathways for such investigation [55]. An existing study based on analysis of papers published by Brain, Behaviour and Immunity identifies clear impacts on female first-author representation during 2020 (compared to 2019) as well as a more pronounced impact on female first authorship than last authorship [57]. These existing sets of evidence provide indications that the impact of the pandemic might have been uneven across scholars of various career stages and that early career researchers might have experienced more pronounced setbacks.
On a final note, the effect of any disruption on research activities is often reflected in publications with a time lag. This paper uses publications as the proxy of productivity, which has a lag between work input and publication. It is expected that the setback to academic activity, if any, may become more apparent in publication records during 2021 and onwards. The results presented by this work could at best provide some early indications of these disruptions whereas true effects may only manifest in the coming years. Whether these effects are transient and how long they might last before recovery is to be determined by future research.

Conclusions
The findings of this study provided an overall picture of quantitative trends of publications in a large sample of research fields. They have several practical implications for research institutions as well as individual researchers. Knowing what research fields are relatively bigger or are expanding faster would, for instance, be of great importance to research institutions when evaluating performances of scholars for matters such as career promotion. This is in consideration of the correlation that exists between the quantity of research articles that are published within a field and the number of citations that researchers of that field receive each year. Similarly, such performance assessments during the pandemic years need to take into account the overall differential impacts that various research fields have endured from the pandemic disruptions. These differential impacts are comprehensively documented in this study.
In relation to gender disparity in overall research production, the findings of this work could guide policy makers who aim to effect changes in long-lasting academic gender gaps. Across different regions of the world, highly differential patterns of gender-related research production gap were unambiguously observable. These observations are particularly important for informing policy makers in countries where the gender gap in research production is not on a tangible path of closing in the foreseeable future, unless effective interventions in academic education, recruitment, research funding allocation, and mentorship are implemented [33,58]. The findings also exemplify countries that have notably accelerated the closing of their academic gender gap (at least as reflected in the metric of total productivity). This may encourage the exchange of information, experiences, and policy guides between policy makers of these countries and those that seek to intervene with their persistent academic gender gaps. Moreover, while we observed that in countries that endured a larger impact of the pandemic on research productivity, female productivity was often more resilient (e.g., The Netherlands), it should be noted that the effect on the momentum of male and female productivity was closely comparable in nearly every case. There was no evidence of any pattern indicating that one gender endured greater impact on its productivity momentum.