Consolidation in a crisis: Patterns of international collaboration in early COVID-19 research

This paper seeks to understand whether a catastrophic and urgent event, such as the first months of the COVID-19 pandemic, accelerates or reverses trends in international collaboration, especially in and between China and the United States. A review of research articles produced in the first months of the COVID-19 pandemic shows that COVID-19 research had smaller teams and involved fewer nations than pre-COVID-19 coronavirus research. The United States and China were, and continue to be in the pandemic era, at the center of the global network in coronavirus related research, while developing countries are relatively absent from early research activities in the COVID-19 period. Not only are China and the United States at the center of the global network of coronavirus research, but they strengthen their bilateral research relationship during COVID-19, producing more than 4.9% of all global articles together, in contrast to 3.6% before the pandemic. In addition, in the COVID-19 period, joined by the United Kingdom, China and the United States continued their roles as the largest contributors to, and home to the main funders of, coronavirus related research. These findings suggest that the global COVID-19 pandemic shifted the geographic loci of coronavirus research, as well as the structure of scientific teams, narrowing team membership and favoring elite structures. These findings raise further questions over the decisions that scientists face in the formation of teams to maximize a speed, skill trade-off. Policy implications are discussed.


Introduction
The global pandemic caused by the widespread coronavirus, COVID-19, stimulated extraordinary amounts of scientific inquiry around the world. The virus first appeared in the scholarly literature on the 24 January 2020 [1], and subsequently, virologists and immunologists worked to isolate and identify the virus, determine its etiology, define the vulnerabilities that may allow treatment, and conduct research on drug and vaccine development. While international a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 collaboration and cooperation are critical actions to address global pandemics, the need for rapid and urgent solutions could render cross-border teamwork more difficult, due to the transaction costs of communication and rising political tensions. This article tracks patterns of international collaboration in coronavirus-related research before and in the period immediately after the start of the COVID-19 pandemic in order to understand how scientists leveraged complementary expertise within their own nation's borders and beyond.
International collaboration in scientific research has grown at a spectacular level since the 1980s, when geopolitical shifts opened up opportunities for formerly restricted researchers to create relationships outside their nation or region [2,3]. The breakup of the Soviet Union, the reunification of Germany, and China's decision to create the "four modernizations" to include science and technology in a process of opening up (改革开放), all served to restructure science. In particular, collaborations between China and the United States have grown rapidly, and the rate of collaboration between these two countries are now more numerous than any two countries in the world [2].
While international collaborations can help scientists in one country to access complementary expertise outside their country's borders, there are search and coordination costs associated with such collaborations [4][5][6]. International collaborative research activities operate as a network [7] which takes time to traverse. No international organization oversees or directs these works: researchers find each other based on shared interests and the needs of frontier science. This is particularly true for sciences of immunology and virology, where no central laboratory or common data set is on hand as an organizing force [7].
COVID-19, the coronavirus that emerged in late 2019 and grew to a global pandemic in early 2020, presented this trade-off between novelty and efficiency to the international community of scholars. The importance of global collaboration and coordination to resolve this enormous challenge is summarized by the director of the United States National Institutes of Health (NIH), stating: "We need to bring the full power of the biomedical research enterprise to bear on this crisis. Now is the time to come together with unassailable objectivity to swiftly advance the development of the most promising vaccine and therapeutic candidates that can help end the COVID-19 global pandemic." NIH Director Dr. Francis S. Collins. April 2020 But at the same time, the urgent need for solutions to combat the pandemic increases the cost of search and coordination needed in internationally collaborative work.
In a time of urgency, we expect that scientists reduce their collaborations, or seek to work with known colleagues to reduce the transaction costs of communication. We hypothesize that the pressures presented by the coronavirus crisis would lead scientists to collaborate internationally at a lower rate than before the pandemic. We expect search to be reduced, and pre-existing relationships to be strengthened, to the exclusion of scientists from less developed institutions, regions, or nations. In addition to a test of these hypothesis, in this study we explore whether these shifts during the early months of the pandemic altered the geographic locus of coronavirus research, and whether there are implications for the quality, and type, of work produced.

Methods
In order to achieve the goals of the study, the project team constructed two brand new datasets: one to capture measures on collaborations in coronavirus research prior to the COVID-19 crisis, and one for the COVID-19 crisis period. The pre-COVID-19 period extends for 24 months prior to December 2019. The COVID-19 period extends from January 1, 2020 to April 23, 2020. Measures of collaboration (and others) are generated using scientific articles.
A complete dataset of scientific articles on coronavirus-related research between January 1, 2018 and April 23, 2020 are extracted from the Clarivate Web of Science (WoS), Elsevier Scopus, and PMC-sourced materials drawn from CORD-19 (COVID-19 Open Research Dataset). Any overlap between articles found across the different source materials were removed. To complement the data on published articles, we also drew preprint articles between January 1 and April 23 2020 from bioRxiv.org, medRxiv.org, and arXiv.org, extracted through the Dimensions database. The following consistent set of keywords was used in searches in the Title/Abstract/Keywords of each article in the respective databases: • "COVID-19" OR "2019-nCoV" OR "coronavirus" OR "Corona virus" OR "SARS-CoV" OR "MERS-CoV" OR "Severe Acute Respiratory Syndrome" OR "Middle East Respiratory Syndrome" Table 1 shows the composition of the datasets in the pre-COVID-19 and during COVID-19. Using the search procedure outlined, the data comprise a total of 10,432 coronavirusrelated articles and preprints with author identifiable information across the two periods for analysis. Within these data, 5,934 articles are published in peer-reviewed journals; the remainder of the works comprise reviews, conference proceedings, and preprints (considered 'informal' hereafter).
Pre-COVID-19 analysis is limited to published records because the historical data are standardized in indexed databases. This allows others to recreate the dataset and test the validity of this analysis. Moreover, enough time had passed to allow most works from pre-COVID-19 to be peer reviewed and published in recognized scholarly venues (an anonymized dataset for this project are also made available on figshare). The COVID-19 period includes both peerreviewed and preprint materials. We include preprint materials because the time pressures imposed by the pandemic crisis propelled ready and open sharing of even initial results: researchers put materials into circulation to provide insights for others without waiting for peer review. This process means that, at the time of this writing, many coronavirus-related articles have not had time to be peer reviewed and published in established venues. By necessity, this means that the COVID-19 dataset includes materials that are questionable in their scientific rigor and that may be methodologically unsound. Future work will return to the materials to re-evaluate the published record for those materials that have failed to make it through the peer-review process.
The data are examined for several features: 1) publication patterns and numbers; 2) public funding patterns to compare pre-and COVID-19, as available; 3) the structure of scientific teams; 4) quality measures of formal publications; 5) collaborative patterns at the international level; and 6) networked collaborations including measures of egonets at the international level. For each article in the datasets, variables of interest are created based on author institutional affiliation, publication journal impact measures, and funding support.
To test any changes in publication patterns and team structure between articles in the preand COVID-19 period, we run a series of tests to ascertain any statistically significant changes. Specifically, we use a combination of one-tailed T-tests and ordinary least squares to ascertain any average differences in article features between the two groups. The one-tailed T-tests compare the mean of variables of the two groups and using the standard deviation of the two samples allows us to assess whether the variables in question come from the same distribution, or different. Statistical significance is assessed at the 0.1, 0.05, and 0.01 level.
To test any dynamic change in network structure of researchers between pre-COVID-19 and during COVID-19, we construct global collaboration networks based on international coauthorships. Collaboration links are first established by addresses in articles, based on a full counting method. Then, co-occurrence matrices are created to show which countries are coauthoring articles together. The coauthorship links are aggregated by country pairs and imported into software VOSviewer [8] and Gephi [9] for network analysis that allows for a statistical review of the whole sample of collaborating countries and allows visualization of the connections. To assess any changes in network positions of nations, we calculate several network metrics for selected nations in both periods, namely: a. Degree, the number of connecting nodes or collaboration partners of a focal country; b. Weighted Degree, a measure of the number of collaboration links a nation has [10]; c. Normalized Betweenness Centrality, a measure of how often a node appears on the shortest path between other nodes in the network [11] and Eigenvector Centrality measures the influence of hubs in a network [12].
Finally, in order to visualize the landscape of the COVID-19 research and compare any change in research interest between the pre-and COVID-19 period from the perspective of topic analysis, we conduct a keyword-based bibliometric analysis to generate a co-term network for the two periods respectively. With the aid of VantagePoint-a software platform for bibliometrics-based text analytics owned by Search Technology Inc, we collect COVID-19-related core terms by exploiting a term clumping process [13] and create co-term networks, in which each node represents a core term and each edge reflects the co-occurrent frequency of its connected terms. VOSviewer [8] is used for visualizing the networks in the form of science maps.

Results
The aforementioned propositions are tested using statistical and network tests designed to ascertain differences in publication patterns, the structures of teams and international rates of collaboration in coronavirus-related research before and during the COVID-19 global pandemic.

National contributions to coronavirus articles
Following the COVID-19 outbreak, as expected given the geographic spread of the pandemic [14], coronavirus-related articles are more likely to be authored by scientists based in China and Italy than before the outbreak. As the two nations that experienced the earliest outbreaks of the virus, this suggests that a need for solutions and access to patient populations can stimulate research productivity in a topic. In contrast, fewer articles emanate from other OECD or developing countries in the early COVID-19 period as compared to before the pandemic. Table 2 compares the geographic sources of coronavirus research in the pre-and COVID-19 research datasets. It is clear that China takes the lead on research publications during the COVID-19 period, with the percentage of Chinese articles growing to 39% from 22% prior to the outbreak, while the United States' output drops as a share of total output during the COVID-19 period. Table 3 provides a comparison of the quantity of articles produced by China and the United States in the pre-and COVID-19 periods available at the time of writing. The number of articles produced by Chinese authors in the first three months of the COVID-19 period-more than 1,600 articles-surpasses the number of coronavirus articles produced by Chinese-based authors in the entire previous 24 months. As a preliminary exploration into the extent of China's involvement in international collaborative research Table 3 also reveals that by April 2020, Chinese authors together with international collaborators produced over 12% of articles in the topic of 'coronavirus'-again, more than the volume that they produced across 2018 and 2019 together. In contrast, United States-based scholars produced just under half of the volume combined of international collaborative research that they produced in 2018 and 2019. This finding is explored further in subsequent sections of the paper.
We turn our analysis from the national to the institutional level in Fig 1, and assess which institutions are the top producers of coronavirus research in the pre-and COVID-19 period. Consistent with Tables 2 and 3, we find that Chinese institutions lead the world in terms of volume of coronavirus articles (including both published articles and preprints) in both the pre-and COVID-19 periods. Moreover, during the COVID-19 period, eight out of top ten of the most prolific institutions are located in China. Wuhan University (which includes Renmin Hospital and Zhongnan Hospital) and Huazhong University of Science and Technology (which includes Tongji Hospital, Tongji Medical College, and Wuhan Union Hospital), located in Wuhan, China are the most prolific institutions during COVID-19, followed by the University of Hong Kong, and Fudan University. However, the Chinese Centers for Disease Control (Chinese CDC), which leads coronavirus research output in the pre-COVID-19 era, drops down the list in the COVID-19 period. In contrast to the rise of some Chinese institutions in the COVID-19 period, the United States National Institutes of Health drops out of the top ten most prolific institutions during COVID-19 at the time of writing of this study. When we exclude preprints from the analysis, we observe that Fudan University drops a lot in ranking and the University of Oxford and Harvard University drop out of the top ten most prolific institutions during the COVID-19 period, leaving the top ten institutions to exclusively consist of Chinese institutions.

Reported funding for coronavirus research
Next, we examine the most commonly acknowledged funding agencies in coronavirus research before and during the COVID-19 period (Table 4). Self-reported funding data is aggregated where possible across published articles found in the Web of Science (WoS) and Elsevier Scopus database. We find that during the COVID-19 period, Chinese agencies are more likely to be acknowledged as the funding source in published work than before the outbreak. In particular, during the COVID-19 period, the most commonly acknowledged funders are National Natural   In contrast, in the early days of the COVID-19 pandemic, United States funders are less likely to be cited as the funding agency than before the pandemic. As an example, the United States Department of Health, which includes the National Institutes of Health and its affiliated funding agencies, drops from the most commonly acknowledged funder in coronavirus research before COVID-19 to the third most commonly cited funder during the COVID-19 period. Table 5 aggregates data on acknowledged funding sources by country-of-origin. During COVID-19 we see Chinese agencies acknowledged as funding the majority of published papers. In this period, at least 46% of articles acknowledge funding from Chinese agencies, while only 18% of publications acknowledge funding from United States based funders. However, prior to the COVID-19 period, Chinese and United States agencies fund about the same number of articles, and this will likely recalibrate as the United States recovers from the initial lockdown. The shift could be due in part to the greater share of Chinese articles during COVID-19, China's longer experience with COVID-19, and support from the Chinese government for coronavirus related research during the COVID-19 pandemic.

Structure of teams
Our primary research questions are related to the structure of teams and international collaboration following the COVID-19 pandemic. Table 6 reveals the first results pertaining to these research questions. The table shows that during the COVID-19 period, research teams are smaller on average in published articles, although not in preprints. We note that preprint teams are slightly larger, which could be due to the recency of these works, since larger teams take longer to produce output. The table also shows that articles in the COVID-19 period are less likely to be internationally coauthored than pre-COVID-19, which is expected given the transaction costs involved in distance collaboration in the early days of the pandemic. The number of countries involved in coauthored articles also drops in the COVID-19 period. This decline in international collaborations is lower for Chinese authors than for the rest of the world, although this difference is not statistically significant. United States-based authors do not experience a change in international collaborations in published articles (minus preprints); however in preprints, we see a decline for United States-based researchers at the international level.

Journal impact of peer-reviewed research
One concern that may arise from this rapid explosion of articles and changing team structures in coronavirus research during the global pandemic is a very broad range of quality than prior to the crisis. A review of impact factors attached to journals carrying coronavirus publications shows that these works are actually published in higher impact journals than was the case in the pre-COVID-19 period. To test for impact, we weight each published article in the two datasets by the Elsevier Scopus Source Normalized Impact per Paper (SNIP) of the publication journal, which is calculated as the journal's citation count per paper divided by its citation potential in its subject area. We assess whether articles in the COVID-19 period are, on average, published in higher impact journals than pre-COVID-19 articles, and whether the growth in Chinese authored publications is driven by publication in lower impact journals. Results are shown in Table 7.
The positive coefficient on COVID-19 in Table 7 column 1 reveals that articles in our sample are published in journals with higher SNIP values in the COVID-19 period compared to the pre-COVID-19 period. This suggests that journal editors and peer reviewers acted quickly in response to the need for scientific understanding about the novel coronavirus. Chineseauthored publications are appearing in as high impact journals as the rest of the world in both the pre-and COVID-19 periods (column 2), and publications with international teams appear in significantly higher impact journals than those with domestic-only teams (column 3). Column 4 reveals that Chinese authors publish in higher-impact journals during the COVID-19 era than the rest of the world although the difference is not statistically significant. Table 7. Regression analysis of the relationship between team structure and the impact factor of journals publishing coronavirus research in pre-and during COVID-19.

Independent variables Dependent variable-Source Normalized Impact per Paper
(1) Estimates stem from ordinary least square model regression specifications with dependent variables being inverse hyperbolic sine transformed SNIP of a publication in the sample, and independent variables being the period of the publication (COVID-19 or pre-COVID-19) (column 1), whether the authors of the publication are from a Chinese institution (column 2), and whether the publication author team is international (column 3). In columns 4, 5, and 6 we include interaction terms of COVID-19 period and the team structure to assess whether there is a different relationship between team structure and SNIP of a publication pre and during-COVID-19.
Robust standard errors in parentheses. � , �� , ��� denote statistical significance at p values of 0.1, 0.05 and 0.01. Table 7, column 5 shows that although internationally coauthored articles are published in higher impact journals than domestically teamed ones, there is no differential increase in the COVID-19 era for international versus domestic teams. Together with the documented decline in international collaborations, this result suggests that domestic teams during COVID-19 are increasing the impact factor of the journals they publish in as compared to before.

International networks of collaboration
Network collaborative patterns have shifted in the COVID-19 era, as expected. Table 8 shows network metrics for major actors in the global network in pre-COVID-19 and COVID-19. The United States is the core player in both the pre-COVID-19 and COVID-19 networks. However, with a decreasing measure of collaboration linkages, or degree, during the COVID-19 era, the United States shows a decreasing role in the network. This is particularly true in comparison to China and Italy, which show increased collaboration linkages during the COVID-19 period, as measured by degree. Among the nations listed, China, the United Kingdom, and Italy show an increase in their normalized betweenness centrality during the COVID-19 pandemic, as compared to the 24-months leading up to the pandemic, indicating an increase in their bridging roles in the network. In terms of eigenvector centrality, the top four nations-United States, China, the United Kingdom, and Italy-retain their centrality into the COVID-19 period but Germany drops considerably in the early days of COVID-19, perhaps due to a lag in research output. As expected, given the shorter period available for production of research, the network is sparse compared to the pre-COVID-19 period. However, we can see that the United States-China relationship has intensified compared to the pre-COVID-19 period. Moreover, these two nations maintain their status as the most centralized players in the collaborative network. Scientifically advanced nations like the United Kingdom, Germany, France, Italy, and Canada remain active in the network, although their strongest connections are still with the United States. Germany's role in the network declines considerably, as do many other nations.   COVID-19 period. Table 9 columns 1 and 2 confirms the increase in the rate of China-United States collaborations as statistically significant in the COVID-19 period in all coronavirus articles and all internationally collaborative coronavirus articles.
That said, because of China's increased quantity of publications in the COVID-19 period, the relative share of collaboration with the United States as a function of overall Chinese publications drops, as shown in Table 9 columns 3 and 4. In an absolute sense, China is producing more and higher quality work on its own. In addition to increasing their domestic outputs, China also strengthened links with Canada, Japan, the Netherlands, Italy, and India in an effort to advance COVID-19 research worldwide.
Similarly, Fig 5A and 5B show the egonet diagram of the United States as the central player, and further support the observation that during COVID-19, the United States has solidified its relationships with a handful of specific countries, particularly China. The consolidation of the United States-China relationship is closely related to dominant role of China in articles published during the COVID-19 pandemic. The United States' collaboration with China remains its strongest link; this can also be seen in Table 9 columns 5 and 6. In contrast, the United States' relative share of collaborative articles with many other nations, such as the United

Research topics in coronavirus articles
Research topics identified from coronavirus articles in pre-COVID-19 and COVID-19 periods provide clues to the potential changes of research emphases during the crisis. Fig 6A reveals  five main clusters of research topics that appear in coronavirus articles in the pre-COVID-19 period, including viral replication (red nodes), viral infection (green nodes), respiratory infections (blue nodes), public health topics (yellow nodes and epidemiology-related blue nodes), and molecular epidemiology (purple nodes). We interpret the relatively clear boundaries between these clusters as evidence for the existence of established research area with welldefined concepts and explicit topics. Comparably, Fig 6B illustrates the focus on a more diverse, and 'chaotic' set of research topics during the early COVID-19 period. The largest node representing a research topic is "Wuhan", reflecting the location of the first occurrence of the virus. The network reveals a diverse range of topic pursued by researchers during COVID-19 including epidemiological characteristics, symptom descriptions, geographical features, and public health concerns (blue and orange nodes). This indicates the predominant concerns amongst researchers, but also reveals that researchers lack a clear focus and coordination. We expect this to change as the features of the virus, the public health concerns, and patient care practices become clearer.

Discussion
Science is increasingly a team activity [15], with scientists self-organizing into collaborations, including international collaborations, as needed by the research questions [6,16]. In particular, the involvement of many more countries coming into the global network of scientists over the past 20 years has led to dramatic shifts in the structure of scientific activity around the world [7]. On the one hand, international collaboration can allow scientists to access expertise, funding, and resources outside of their own nation. However, the search and coordination costs of this type of collaboration are high. For scientists, the decision to engage in international collaboration represents this inherent trade-off. During a global pandemic these trade-offs intensify. The need for broad expertise and pooled resources is greater than ever, but, an urgent need for scientific input into public health and economic decisions puts a premium on transaction costs associated with long-distance and cross-cultural communications. Based on this logic, in this study we hypothesize that a global pandemic would result in a reduction in the usual search and outreach, since scientists need to limit the coordination costs of research. Specifically, we expect to see that researchers return to known-collaborators and smaller teams to speed the research process during a global pandemic. We test these hypotheses through comparing the patterns found in coronavirus related research prior to and during the COVID-19 global pandemic and find that the pandemic is inducing changes in the global organization of science, at least as related to coronavirus.
A review of early publication and cooperation patterns of scientific publications at the global level highlights that scientists rapidly reorganized to address the crisis posed by COVID-19 along the lines of greater efficiency and narrower focus. As expected, the dynamics of collaboration and teaming appear to shift to rely on fewer team members, which reduces the transaction costs of communicating among the group, and can, in theory, speed the research and writing processes. The challenge of the novel virus strengthens the research relationship between China and other scientifically advanced countries, especially with the United States. At the same time, Chinese researchers become more independent increasing the volume, and quality, of domestic output in the COVID-19 era. Moreover, the Chinese research funding agencies played a vital role in the earliest days in supporting high quality research and development work in China. These findings are in contrast to some popular accounts that Chinese scientists are withholding valuable information and reducing cooperation in the early stages of the global pandemic [17].
Although we interpret the findings as providing insight into the theoretical expectations, this study has three main limitations. First, the period used for analysis pre-COVID-19 and during COVID-19 is different. Due to the nature of data collection, we had a much shorter period available to us for the COVID-19 period. We account for this in the statistical analysis, however we anticipate that some of our trends may change longer term. In particular, the geographic nature of scientific production and funding may represent the geographic spread of the coronavirus. For example, the virus spread through China before it was observed in the United States, and so it is not a surprise that the Chinese government and funding agencies allocated resources earlier than the United States. Future research should explore longer term dynamics in scientific funding and productivity around the world.
Second, we use the Source Normalized Impact per Paper (SNIP) to reflect the impact of research. There are many possible measures of research impact, including raw citations, normalized citation scores, and practical measures such as policy influence, news take-up and discussions in online forums. Despite the limitations with the use of SNIP to assess impact, including the influence of publication language, open access journals, fast track publications and the fact that publications from zero impact journals are excluded [18], we chose this measure due to the real time data collection precluding the use of citation measures and alternative measures of influence. Future research should use a variety of impact measures to identify the most important and impactful research during the COVID-19 period.
Finally, our study is limited by the data available on scientific production and funding. In particular, our measure of funding source for coronavirus research is limited. We exploit the text in funding acknowledgments in articles, many of which do not have funding acknowledgments as it is not a requirement by the journal or database. Future work could use data directly from funders themselves as it becomes available or surveys of researchers to ascertain any trends.
That said, we interpret the findings from this study as providing insight into the theoretical framework, and this paper suggests that global scientists face a trade-off in decisions on international collaborative activities around time and efficiency, and that the trade-off changes dramatically during a time of urgency, such as the COVID-19 pandemic. The observed reduction in the rate of international collaborations and consolidation of the strongest existing bilateral relationships during the COVID-19 pandemic could have consequences for the organization of science and direction of research. One of the most important of these findings is a reduction in participation of researchers from developing countries in coronavirus related research. Future work will examine the nature of teaming, preferential attachment, the role of influential individuals in the global networks as well as the consequences of the organization of science on the evolution of research topics during the COVID-19 pandemic.
Policymakers who are tracking and guiding research into coronavirus topics and vaccines may wish to be aware of the changing dynamics of international teams. While it is important to increase efficiency, smaller teams could mean that knowledge diffusion and wide-ranging expertise and novelty are reduced [3,15]. This fact is particularly true for those research institutions which are not among the most elite institutions or within those teams gathered around leading scientists who have ample funding. The results of the narrowing and focusing of research may mean that results arrive more quickly, but it also means the results and capacities are diffused more slowly. Validation may be compromised. Policy actions to address these inequities may be needed in the very near term.