The evolution of scientific literature as metastable knowledge states

Sai Dileep Koneru; David Rench McCauley; Michael C. Smith; David Guarrera; Jenn Robinson; Sarah Rajtmajer

doi:10.1371/journal.pone.0287226

Abstract

The problem of identifying common concepts in the sciences and deciding when new ideas have emerged is an open one. Metascience researchers have sought to formalize principles underlying stages in the life cycle of scientific research, understand how knowledge is transferred between scientists and stakeholders, and explain how new ideas are generated and take hold. Here, we model the state of scientific knowledge immediately preceding new directions of research as a metastable state and the creation of new concepts as combinatorial innovation. Through a novel approach combining natural language clustering and citation graph analysis, we predict the evolution of ideas over time and thus connect a single scientific article to past and future concepts in a way that goes beyond traditional citation and reference connections.

Citation: Koneru SD, McCauley DR, Smith MC, Guarrera D, Robinson J, Rajtmajer S (2023) The evolution of scientific literature as metastable knowledge states. PLoS ONE 18(7): e0287226. https://doi.org/10.1371/journal.pone.0287226

Editor: Ilya Safro, University of Delaware, UNITED STATES

Received: May 1, 2022; Accepted: June 2, 2023; Published: July 12, 2023

Copyright: © 2023 Koneru et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The dataset used for this work contains the titles, abstracts, references, and DOIs of the publications collected from Clarivate, but restrictions apply to the availability of these data. The data were used under special privileges as a part of NCSES license for the current study, and so are not publicly available. The complete dataset is however available from the authors upon reasonable request and with permission of Clarivate (can be contacted by https://clarivate.com/contact-us/). However, the dataset can be fully created by using Semantic Scholar, which is free to use, using the DOIs and titles of the seed publications used for this manuscript. The DOIs and titles are available at https://github.com/QS-2/VESPID/blob/main/seed_data/score_papers.csv.

Funding: This research was supported by the National Center for Science and Engineering Statistics (NCSES) at the National Science Foundation through award 49100420C0030. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Early work in metascience can be traced back at least half a century [1], although it has been only in the last decade or so that a robust literature has been seeded exploring co-authorship networks, citation networks, topical networks and similar static and one-dimensional representations of complex interactions amongst researchers and their work. Much of this has been powered by the increased availability of digital data about scientific processes, improvements in information retrieval, network science, machine learning, and computational power. A substantial subset of this literature has focused on quantifying and predicting success in publishing—how we should measure success, who will have it, and what factors contribute to having it. Seminal work has focused on modeling citation patterns for papers [2] and researchers [3], with more recent work setting out to explain hot streaks in researchers’ career trajectories [4], unique patterns of productivity and collaboration amongst the scientific elite [5], and even the role of luck in driving scientific success [6, 7]. We are also seeing the emergence of metascience as a social movement [8], catalyzed by the last decade’s reproducibility crisis [9], aiming to describe and evaluate science at a macro scale in order to diagnose biases in research practice [10, 11], highlight flaws in publication processes [12], understand how researchers select new work to pursue [13, 14], identify opportunities for increased efficiency (e.g., automated hypothesis generation [15]), and forecast the emergence of research topics [16, 17].

The modern philosophy of evolution of science is rooted in Kuhn’s structure of scientific revolutions [18]. According to Kuhn, majority of science progresses in the phase of Normal Science in which literature is built based on existing paradigms that constitute scientific theories, methods, techniques that are widely accepted in a research community. This phase is disrupted by shifts in paradigms caused by new scientific theories. This revolutionary characteristic of evolution of science was challenged by parsonianism [19]. According to this school of thought, scientific theories are composed of autonomous and distinctive parts such as concepts, topics, methods, methodological assumptions etc. Due to this, the shifts in science are piecemeal and complex rather than linear change as proposed by Kuhn. Inductive analysis of literature from interdisciplinary fields has shown that evolution in science is primarily combinatorial with splitting and merging of knowledge groups [20]. A knowledge group can split when sub-groups within them emerge due to their specialization/maturity. Merging events occur due to convergence of applied and basic sciences. This further solidifies the view among researchers in scientometric communities that science is an ever expanding combination of ideas [11, 21].

Prior work on the evolution of research can be broadly viewed in three categories based on method: network-based; language-based; and hybrid methods using both networks and language. Language-based methods include topic modeling, e.g., Latent Dirichlet Allocation (LDA) [22, 23], sequential topic modeling [17], keyword tracking [24], and analyzing linguistic context [16]. Studies using network-based methods typically use citation networks and clustering algorithms to model the literature [25, 26]. Others have used temporal [27] or multiplex [28] networks, or projections of citation networks, e.g., co-authorship graphs [29, 30]. Due to the nature of citations and citation practises, the citation analyses have various limitations. These limitations range from practical such as completeness of citations captured by the databases [31], to more fundamental issues such as accuracy of citations [32]. The other limitations include the differences in citation citation patterns among disciplines [33]. More critically, while valuable, one can not solely rely on citations for any analysis [34]. More recent work using hybrid approaches to explain bibliometric dynamics has relied on network analysis, with post-hoc application of linguistic analysis to generate explanatory labels [35]. Others have used LDA to generate topic co-occurrence networks [36]. However, to the best of our knowledge, there are no existing hybrid methods which systematically incorporate insights from both language models and citation networks for the purposes of explaining and predicting the evolution of scientific literature.

We suggest that integration of citation-based network information and semantic information using deep learning based language embeddings offer a novel opportunity to capture the trajectory of ideas within and between disciplines over time. Specifically, we show that citation-driven and language-driven models, respectively, capture overlapping but distinct and complementary dynamics in scientific research. Furthermore, we believe that this approach also avoids the reliance of pre-defined discipline categories by the scientific publishing databases. We use pre-trained neural network models [37] to generate vectorized representations of the literature while separately leveraging citation network measures (e.g., betweenness centrality), and combine these two inputs to build predictive models of topical evolution. The intuition behind the mechanisms explored herein is that scientific disciplines can be described at a high level by aggregation of related ideas. When a discipline is beginning to show signs of fracture or change via the emergence or synthesis of new ideas, we model this moment borrowing from physics the concept of metastability: a state easily perturbed into a new state. We suggest that integration of knowledge from different fields is a driver of this change and hence measures of interdisciplinarity may be indicators of metastability and thus useful predictors of change.

Recent work has highlighted the role of interdisciplinarity in scientific practice [38–41]. Interdisciplinarity has been shown to be linked to innovation and impact [11, 42]. Calls for collaboration across disciplines are prominent throughout research institutions and funding agencies (See, e.g., the U.S. National Science Foundation’s Growing Convergence Research program: https://www.nsf.gov/od/oia/growing-convergence-research/index.jsp) but some have argued that the promises of interdisciplinarity are overstated and misplaced [43]. The bibliometric community has offered a data-driven framing for interdisciplinary studies, e.g., defining interdisciplinarity as a process of integrating different bodies of knowledge [44, 45].

Definitions of interdisciplinarity vary in the literature [46]. Most fall within one of two types: subject-based and network-based definitions [46]. Subject-based metrics rely on multi-classification systems to calculate interdisciplinarity, leaning on pre-defined subject categories, e.g., handed down from journals or from the Web of Science (WoS) [47]. Definitions of interdisciplinarity are operationalized by way of these categories, e.g., percentage of references cited from outside a journals’ interest categories [48, 49]. In some cases, interdisciplinary metrics are borrowed from other fields, such as the Gini index from economics or Shannon entropy from information theory [50]); these are also based on subject categories. Network-based interdisciplinarity metrics are typically assessed based on the location of a publication in a citation network [51], with centrality measures frequently being the focus. For example, betweenness centrality, which is independent of third-party categorization, was one of the first metrics used in this way [51, 52] and has likewise been used to predict future network trends [53, 54].

To study the evolution of knowledge the scientific literature, we: (1) develop methods that utilize transformers-based language models and unsupervised clustering to track the evolution of ideas over time; (2) quantify interdisciplinarity using complementary text- and citation-based metrics; and (3) explore the utility of metastability, measured through interdisciplinarity, as a predictor of scientific evolutionary events.

Materials and methods

Dataset

Our dataset contains detailed records of 19,177 scientific papers published in the years 2011 through 2018, with 2300 to 2500 papers for each year, representing a substantial stratified random sample of papers published in 62 prominent journals from the following social and behavioral science disciplines as strata: Criminology; Economics and Finance; Education; Health; Management; Marketing and Organizational Behavior; Political Science; Psychology; Public Administration; and Sociology. Sampling was done in conjunction with DARPA’s SCORE program. The sample contains approximately equal representation of papers per year and per journal. For a complete listing of journals and sampling methodology see [55]. The semantic analysis of this work is entirely dependent on studying these papers, which will henceforth be referred to as the core publications. Metadata for these papers was collected using the Web of Science as a primary source. Digital Object Identifiers (DOIs) were used to merge WoS records with Semantic Scholar (S2) records [56, 57] for completeness of metadata coverage and author name disambiguation. When DOIs were not available from WoS, we used Crossref [58] to fill in missing DOIs for more complete record linking between WoS and S2. For citation network analyses, we also included all papers referenced by these core papers. In total, the citation network includes records of 19,177 papers and their references (819,919 papers) and about 1.45 million citations. The citations earned by the core papers were not included in the analysis since this work uses historical information to forecast the state of a knowledge group in the following year. As citations take time to accumulate, we assume that they wouldn’t be available at the time of forecasting. Additionally, considering the number of citations received would skew the findings as the publications published at the start of an year tend to garner higher citations than the ones published at the end of the year. Since it takes time for citations to accumulate, we assume that publications’ incoming citations wouldn’t be known available at the time of forecasting.

Methods

We use parallel workflows to model dynamics in bibliometric data—one based on text and one based on citation networks (Fig 1). For each, we derive a measure of interdisciplinarity useful for prediction of knowledge evolution. We describe our explanatory and predictive experiments to evaluate our measures.

Download:

Fig 1. Data analysis workflows.

(Top) Text-based analysis. Title and abstract are concatenated and input to a language embedding model, then dimensionally reduced and fed into a clustering algorithm; clusters of embedded papers are then used for event modeling and interdisciplinarity scoring. (Bottom) Citation-based analysis. Citation information is used to create undirected citation graphs; the Louvain algorithm is used to identify network communities and betweenness centrality is used for interdisciplinarity scoring. Interdisciplinary metrics are jointly used to predict disciplinary evolution.

https://doi.org/10.1371/journal.pone.0287226.g001

SPECTER-based topic modeling

We use language embeddings-based topic modeling to identify topics within our corpus by year. To do so, we extract embeddings for each publication in our dataset using the concatenated title and abstract as an input to SPECTER (Scientific Paper Embeddings using Citation-informed TransformERs) [37], a Bidirectional Encoder Representations from Transformers [59] (BERT)-based model for generating document-level embeddings of scientific documents via pre-training on scientific papers and their citation graphs. Specifically, we use the huggingface implementation [60] of the pre-trained SPECTER model. SPECTER embeddings have been shown to be useful in downstream document-level tasks –citation prediction, document classification and recommendation– without any task specific fine-tuning [37].

To identify disciplines and subdisciplines, we use an unsupervised, non-parametric, hierarchical clustering algorithm, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) [61]. Specifically, we soft-cluster SPECTER embeddings to reflect that papers may belong to multiple (sub)disciplines with different probabilities. Because the performance of HDBSCAN generally reduces as the dimensionality of input data increases, we use UMAP [62] to reduce the dimensionality of SPECTER embeddings prior to clustering with HDBSCAN. We use multi-objective Bayesian hyperparameter tuning [63] for the UMAP-HDBSCAN pipeline to balance five evaluative criteria related to balancing inter- vs. intra-cluster density, number of clusters, and persistence of clusters over multiple runs of the algorithm.

In particular, UMAP requires us to specify: (1) the dimensionality of reduced embeddings; (2) a distance metric; (3) the size of the local neighborhood; and (4) the minimum separation distance for dimensionally-reduced points. For HDBSCAN, we need to provide: (5) minimum cluster size; (6) a cluster selection method; and (7) minimum number of neighboring points required to identify the “core points” of a cluster. To do so, we use the optuna framework for multi-objective Bayesian hyperparameter tuning [64].

For each year in the dataset, we run 1000 hyperparameter permutations for the UMAP-HDBSCAN modeling pipeline and choose hyperparameter combinations that rank highest. As UMAP is a stochastic algorithm, we perform repeated runs for each hyperparameter combination to select solutions that are robust to randomness based on the following five evaluative criteria:

Mean Density-Based Clustering Validation. Density-Based Clustering Validation (DBCV) is a relative validity index that is commonly used to assess the quality of clusters obtained by density based clustering algorithms. DBCV is computed using the inter- and intra-cluster density connectedness [65]. High quality clustering solutions have high DBCV scores as they have low inter-cluster vs intra-cluster connectivity. DBCV is a commonly-used method to evaluate cluster solutions with HDBSCAN;
Standard deviation of DBCV. To complement mean DBCV, we consider variation in DBCV scores over multiple runs. Robust hyperparameter combinations should exhibit low standard deviation of DBCV scores;
Mean number of clusters. Evaluating cluster solutions solely on DBCV-based metrics is insufficient as solutions with few clusters or containing clusters with low persistence scores can lead to results with high mean DBCV score but negligible practical value for many domains. We therefore consider additional metrics, the first being mean number of clusters, with this formulation preferring greater cluster resolution. For consistency across metrics and to avoid the effect of different scales on the solution quality, we use min-max scaling to normalize the number of clusters when hyperparameter tuning;
Standard deviation of number of clusters. As with DBCV, a good hyperparameter set should exhibit low standard deviation of cluster counts;
Mean of mean cluster persistence. Optimizing for a solution with a high number of clusters can lead to ephemeral clusters that dilute results. We use the mean of mean per-cluster persistence across identical runs to evaluate stability of the clusters identified.

Given the evaluation criteria, the maximum value of the objective function occurs when the values of Mean DBCV, Mean of mean cluster persistence, Normalized mean number of clusters are close to 1 while the standard deviations of DBCV, number of clusters are close to 0. Successfully clustered papers are considered “strong members” of their cluster. While, we refer to papers that cannot be confidently assigned by the clustering algorithm as “weak members”. We assign each weak member to the cluster with which it has the highest semantic similarity. For completeness, we report downstream analyses with and without inclusion of weak members. We consider this distinction because we suggest that weak members represent research which is significantly different (and potentially truly innovative) relative to existing disciplines, and as such can help explain shifts in the trajectories of fields. HDBSCAN refers to these non-confident assignments as noise; however, we expect these not to be noise in the traditional sense (e.g., an outlier or data worthy of discarding as it provides no analytical value) but instead to potentially add value as research different from other work in their field. For example, one weak member is an article titled Strengthening the Experimenter’s Toolbox: Statistical Estimation of Internal Validity with devoted a major portion of article to statistical methods applied to political science field is published in American Journal of Political Science [66] that predominantly publish practical applications.

For each cluster, we generate representative keyphrases using a procedure similar to the KeyBERT library [67], with modifications (e.g., more performant aggregation of embeddings from large numbers of documents belonging to the same cluster). Deriving keyphrases provides explanatory power for clusters and adds more nuanced understanding of the clusters than other commonly used approaches to grouping knowledge products, e.g., WoS categories. As an example, clusters identified in our dataset for the year 2011 and their corresponding keyphrases are shown in Fig 2. We identify a total of 371 clusters over the complete dataset, i.e., years 2011 through 2018. The distribution of the number of clusters for each year is shown in Fig 5.

Download:

Fig 2. UMAP projection of SPECTER embeddings of publications, projected to two dimensions, in the year 2011 colored by HDBSCAN-generated cluster labels with corresponding cluster-level keyphrases.

Each cluster plotted here contains at least 2.5% of total papers for the year and the size of each point is proportional to that publication’s language-based interdisciplinarity score. Small blue points represent weak members. Note that most clusters shown are well-separated and not homogeneous in shape, suggesting that UMAP is doing a good job of dimensionally reducing the feature space in such a way that it is reasonably straightforward to partition and that a variable-density-based clustering algorithm, such as HDBSCAN, is well-suited to identifying clusters in such a dataset.

https://doi.org/10.1371/journal.pone.0287226.g002

Citation graphs and communities

Per common practice, our citation-based analysis considers the citation network wherein nodes in the graph represent papers in our dataset and undirected edges represent citation relationships. We detect communities in this network using the Louvain community detection algorithm [68] which maximizes modularity of the network, namely the expected value of inter- vs intra-community edges [69]. Specifically, for a given time window/year of interest t we consider the subgraph G(t) containing only papers published in year t and earlier, as well as their references. This approach allows us to make predictions for past papers without fear that future papers citing them will cause information leakage into the dataset (e.g., a model trying to predict the evolution of an idea tied to a paper from 2017 should not have access to information about papers from 2018 citing it during model training). An example of the community structure discovered via the Louvain method is shown in Fig 3.

Download:

Fig 3. An exemplary snapshot of the dense network and communities found by the Louvain community detection algorithm for the year 2011.

Publications belonging to the communities comprising less than 2.5% of total papers for the year are suppressed to improve the visual clarity. Note that a clear community structure can be observed for this graph-only approach much like it was for the language-only clustering presented earlier.

https://doi.org/10.1371/journal.pone.0287226.g003

Quantifying interdisciplinarity

Language-based interdisciplinarity.

Our text-based interdisciplinarity (ID) metric scores each publication based on its soft clustering membership probabilities (i.e. the probability of a publication belonging to each possible cluster identified by standard or “hard” clustering), considering only strong member publications. It does so by assuming that one representation of interdisciplinarity is the diversity of language pulled from different fields. This metric is calculated using Eq 1 which considers the spread in its cluster assignment probabilities. Formally: (1) where N is the total number of clusters in the dataset, P_cluster is the probability of the paper belonging to a cluster, P_wm is the probability of the paper being a weak member of any cluster, and σ_p the standard deviation of P_cluster over all clusters. This formulation is more intuitive when extreme cases are considered. For example, consider a corpus with 9 clusters for the year of interest. Consider a paper that sits very clearly within a single well-defined scientific discipline, i.e., max(P_cluster) = 1 for a single cluster (consequently, P_wm = 0). The interdisciplinarity score for that paper would be ID_text = 0.0. Alternatively, imagine a paper with membership probabilities that are equivalent for all clusters, with the same probability that it may be a weak member, i.e., P_wm = P_cluster,i = 0.1 for N = 9. This would result in ID_text = 0.9, reflecting that the paper belongs to a wide array of disciplines/clusters equally, but also there is some chance that it may be a weak member—which can also be interpreted as a global uncertainty in the membership probabilities—thus keeping it from achieving a score of 1.0. This proposed interdisciplinarity metric presents a departure from existing metrics in the literature, as it allows for language clusters to potentially denote knowledge groups at different levels of granularity, ranging from topics to entire disciplines. Despite the research community’s lack of consensus regarding the definition and operationalization of interdisciplinarity, it is generally acknowledged that the concept involves the synthesis and integration of knowledge across diverse knowledge groups [46]. Thus, we expect that the proposed language-based metric can serve as an effective means of measuring interdisciplinarity, independent of pre-defined disciplinary boundaries.

Citation-based interdisciplinarity.

We use betweenness centrality for each publication in the network as an interdisciplinarity metric, with higher centrality generally indicating higher interdisciplinarity, as has been done in previous literature [70]. As we do for community detection, we use time-windowed subgraphs for centrality measurement. Betweenness centrality is lightly modified for use as an ID metric, normalized on a [0, 1] scale. For paper i in publication year t: (2) where {centrality_t} is the set of all centrality values for papers published in calendar year t.

Text-based dynamic event modeling

We identify and track critical knowledge evolution events borrowing from the literature tracking communities in dynamic social networks [71]. Specifically, representative embeddings for each cluster are calculated using the element-wise mean of embeddings of the papers in each cluster, and clusters are compared across consecutive years by calculating the pairwise cosine similarity of the embeddings of each [C_t, C_{t + 1}] pair of clusters in years t and t + 1 [71]. We then link a cluster with its best-matching cluster(s) in the consecutive time step if the cosine similarity is above 0.95 (To gain a deeper insight into the impact of the threshold value on the results, a posthoc analysis was carried out, and the findings are comprehensively presented in S1 Appendix). We employ the following taxonomy [71]:

A birth event is identified at time t when a cluster at time t has no matching cluster(s) at time t − 1.
A death event is identified at time t when a cluster at time t has no matching cluster(s) at time t + 1. This framework uses comparison of clusters from consecutive years to determine if a cluster is dead and does not consider the possibility of the revival of a dead cluster in later years. Although the reappearance of dead clusters has not been identified in this dataset, this may be explored in future work.
Multiple clusters have merged at time t when one cluster at time t matches to two or more clusters at time t − 1.
Multiple clusters have split at time t when one or more clusters at time t match to a single cluster at time t − 1.
A continuation event is observed when one cluster at time t is matched to exactly one cluster at time t + 1.

We group these events into two types for subsequent analyses: (1) dynamic—split or merge and (2) stable—continuity or death. Not only does treating splits and merges as a single class emerge from our metastability mental model but, given that they often co-occur, this treatment creates non-overlapping classes. We disregard birth events at present since a birth event has no preceding data from which to build a model and is unrelated to the concept of combinatorial innovation being described by metastability. Fig 4 gives a notional example of merge and continuation events. We note that events may occur in combination; e.g., a cluster may split into two, and those two clusters may simultaneously merge with two other clusters.

Download:

Fig 4. Notional continuation and merge events showing weak (significantly different from existing clusters) and strong members (high confidence in its membership) of each cluster.

https://doi.org/10.1371/journal.pone.0287226.g004

Event-tracking and prediction

We hypothesize that interdisciplinarity scores and cluster size are indicators of metastability and therefore can be used to predict cluster evolution, i.e., dynamic vs. stable events, as an endogenous and target variable. In particular, for each language cluster C_t at time t, we use as exogenous model inputs: cluster-wise mean language-based interdisciplinarity score (which does include weak member papers); mean citation-based interdisciplinarity score for weak and strong members, treated as separate features in order to discern if there is any difference in predictive power considering weak members; and number of weak and strong member papers in the cluster.

To choose the most powerful features and test their predictive power (and thus value for further analyses), we use multivariate logistic regression and a Random Forest classifier with a binary target representing if a dynamic event type (split or merge) is observed for a cluster at time t + 1 as shown in Eq 3. (3) We use the entire dataset with multivariate logistic regression for explanatory power. For the random forest, we have also evaluated the model performance with different test train splits as discussed in S1 Appendix. Based on these findings, we use cluster events in 2011–17 for training and 2018 for testing, resulting in roughly an 86%/14% train/test split by cluster count with 275 events for training (split/merge: 136; continuation/death: 139) and 43 testing events (split/merge: 21, continuation/death: 22). Using the above input features and event types in year t + 1, we fit a random forest model using the scikit-learn python library [72]. To avoid overfitting, we tune hyperparameters—number of trees and maximum depth of each tree—of the random forest model using grid search and found that a model with 44 classifiers each with a maximum depth of 4 achieves the best F1 scores on held-out data. Results of these experiments are described in S1 Appendix.

Results

Following, we show that language and network frameworks capture different information by comparing the overlap between clusters identified using text and citation-based communities. We then further investigate the nature of the information provided by both frameworks by discussing how these representations, when considered together, serve to predict the evolution of disciplines and sub-fields.

Comparing clusters and communities suggests valuable incomplete overlap

Fig 3 gives a snapshot of network communities in 2011; comparison with Fig 2 illustrates differences in grouping across the two approaches. In general, the Louvain algorithm detects communities in the citation network at a finer resolution than our text-based clustering. For reference, Fig 5 shows the number of clusters and communities in our dataset, in addition to a measure of overlap between the two that we describe below. The number of network communities generally decreases over time, reflecting a more integrated citation graph emerging amongst the papers in our sample.

Download:

Fig 5. Plot with number of clusters/communities identified by text-based (brown) and networks-based (blue) frameworks with inset plot showing percentage of language clusters associated with at least one network-derived community.

Note that overlap values are consistently below 100% but well above 0%, suggesting unique and complementary insights added by each. The trends in the number of language clusters, network communities by year could potentially be attributed due to the integration of knowledge from the disciplines that were not included in the dataset (refer Discussion section).

https://doi.org/10.1371/journal.pone.0287226.g005

As both our language- and citation-based frameworks are unsupervised, to compare them we need to identify clusters with one another across frameworks. For this, we measure pairwise Jaccard similarity between clusters and communities, effectively looking at the fraction of shared publications between every language cluster and every network community relative to their total number of member papers. If the similarity between a cluster and a community is above 0.1 then we consider them similar. This threshold-based method (and the 0.1 threshold specifically) has been used in the literature for tracking clusters and communities over time [71, 73] and performs well across a variety of synthetic graphs. Going back to Fig 5, the inset shows the percentage of language clusters with similar (Jaccard similarity > 0.1) network communities. While there is overlap between the communities and clusters, the overlap is not complete, suggesting that each approach adds unique insight.

Illustrating knowledge evolution events

To illustrate the types of knowledge events we identify and track in this work, let us consider an example from our dataset. Fig 6 shows the evolution of a full chain of language cluster evolutionary events over the period 2011 through 2018. Every cluster in this chain has “Business and Finance” and “Economics” as the most common WoS categories among member papers. In contrast, the keyphrases generated via our language clustering approach reflect greater resolution, including phrases like “income hedging” and “intangible capital”. This chain starts with a 2011 cluster that appears related to the (then recent) U.S. housing market crisis and Great Recession. There is a strong focus on work discussing corporate governance and government spending. This focus on organizational-level finance and economics mostly continues through 2017, with only a few deviations that are more focused on overall market trends. This is epitomized by the representative paper for one of the 2016 clusters, focused on European banking. Then something happens in 2018: topics appear to shift substantially from organizational/macroeconomic concepts to research focused on individual-level spending, finance, and decision-making, as can be seen both from the keyphrases representing those linguistic clusters, as well as from the representative 2018 paper focused on accounting for consumer behaviors in investing. It is interesting to note that this timing corresponds with Richard Thaler’s 2017 Nobel Prize in Economics, awarded for contributions to behavioural economics.

Download:

Fig 6.

Evolution of a set of language clusters from 2011 to 2018 (left to right) and keyphrases for each, along with two representative papers for two of the clusters. Note the marked change in focus between 2016 and 2018 evidenced by representative titles and cluster keyphrases. The split event for the 2017 cluster was successfully predicted by the random forest classifier described later (brown box).

https://doi.org/10.1371/journal.pone.0287226.g006

Knowledge evolution is significantly associated with interdisciplinarity and weak members

We use multivariate logistic regression with the mentioned endogenous and exogenous variables to evaluate how knowledge evolution may be explained through our interdisciplinarity scores, cluster size and network metrics. Per common practice, we insert a constant and a year variable to account for potential temporal effects. We attempt to explain whether or not clusters split or merge first, in order to evaluate the strength of associations between our hypothesized inputs and outputs.

Per Table 1, we see significant positive associations between a cluster splitting or merging and the language interdisciplinarity score and network interdisciplinarity score with only certain associations (i.e., without weak members). Following common best practice, tests were conducted with all features, and, finding some insignificant, repeated with only significant features. See S1 Appendix for details of this purposeful selection. We also see positive association with the number of weak members, and a negative association with the year. Year was included per common practice to remove potential effects from time passing. Note that this model had a higher pseudo R² than a model without the year included. Future work should investigate temporal associations through, e.g., time series analyses. Though all marginal effects are on the same order of magnitude, ranking by those effects, the language interdisciplinarity score is most important, followed by the number of weak members, and the network score without weak members. Next, we further investigate this statistical relationship by testing the predictive power of a model trained on only a subset of cluster data.

Download:

Table 1. Multivariate logistic regression results describing associations with split or merge (1) vs. continuation or death (0).

Note significant positive associations with language score, network score, number of weak members, and a negative association for the year. All other features were not significant, and left out via purposeful selection for a more parsimonious model; see S1 Appendix.

https://doi.org/10.1371/journal.pone.0287226.t001

Validating our statistical result with predictive power—equal importance of interdisciplinarity scores

We have shown significant associations between knowledge splitting and merging, and interdisciplinarity and weak members. Here we go further by performing predictive modeling with a random forest classifier. Including only features shown to be statistically significant, we achieve a micro-averaged F₁ = 0.814 on our held-out test set, with F₁ = 0.818 on our class representing knowledge evolution (i.e. splitting or merging), a performance that is significantly better than random chance. Specifically, we present Table 2, which intuitively shows both interdisciplinarity scores to be equally important in achieving our predictive power. The number of weak members associated with a given cluster is next-most important, followed by the year variable. We validated against potential issues that can affect the Gini feature importance values from a random forest, specifically issues that arise when features exhibit multicollinearity and a bias towards numeric and high-cardinality categorical features [74]. The first is not a problem in this case, as the high-correlation features were removed as a result of the logistic regression analysis discussed earlier. The second is expected to only be a minimal concern for this analysis, as the only non-numeric feature in this model is the publication year. Because this is a low-cardinality categorical feature, it may be the victim of a bias in the feature importances and, as a result, the year’s true ranking in the feature importance table could be higher than is indicated. As this is not a critical change in the data for our analysis, correcting for this bias is beyond the scope of this work. Taken together, our results underscore the importance of including both the linguistic and network viewpoints of interdisciplinarity. To further validate this claim we have conducted feature ablation studies which show that classifier trained using only linguistic features or network features have lower predictive power. Details of ablation studies are provided in S1 Appendix.

Download:

Table 2. Random forest results on a held-out test set predicting the different types of cluster events a given cluster would experience in the next year, with the same features as in Table 1.

We achieve a micro-averaged F₁ = 0.814 on our held-out test set, with a class-specific F₁ = 0.818 for the class representing knowledge evolution (splits and merges). Per reported Gini feature importance of each independent variable, both interdisciplinarity scores are equally important, followed by number of weak members, then year. Note that the sort order of this table is identical to that of Table 1 to allow for more direct comparison of logistic regression coefficients to random forest feature importances.

https://doi.org/10.1371/journal.pone.0287226.t002

Discussion

Through both explanatory and predictive efforts, we show that language and network interdisciplinarity increase metastability of disciplines and sub-fields. Interestingly, network interdisciplinarity of strong member papers is significantly predictive of these mixing events, although the number of strong members is not. By contrast, although weak members’ network interdisciplinarity is not significantly predictive, more weak members are predictive of knowledge recombination. One explanation may be that papers that do not cluster neatly are indicative of combinatorial innovation that is expressed as knowledge mixing events in our framework. Consequently, if one is interested in spurring broad interdisciplinarity, one might encourage more weakly-clustered research, regardless of its own network-derived interdisciplinarity. Future work should further investigate these relationships, in particular over longer time scales. For example, one might explore whether weak members at time t₁ can lead to a stable cluster at t₂ (t₂ > t₁), i.e., indicating research efforts have reached a “critical mass”.

Our work motivates new hybrid models that align multiple views of the literature, e.g., linguistic, bibliometric, into unified modeling frameworks. Looking beyond traditional single-view approaches, such frameworks would be better suited to capture the richness of the scholarly record. This can be achieved through so-called graph machine learning models, which support an integrated representation of data reflecting both its content, e.g., language in the case of a scientific paper, and its context within a network. Further, the work we describe here is mostly based on unsupervised learning. There is no readily-available ground truth that is universally acknowledged to reflect the changing nature of scientific thought, disciplines, and sub-disciplines at a time scale reflective of how ideas mature and evolve. Future work should build benchmark datasets with which the metascience community can engage to evaluate and test these approaches more thoroughly than is currently possible. The building of such benchmarks is bound to be a challenge, given that organization and taxonomization of scientific knowledge can be considered along many dimensions and carries inherent subjectivity. This was evident from expert feedback we elicited to evaluate our algorithm-generated clusters. Further detail about the brief survey we conducted is provided in S1 Appendix.

Finally, there exists a number of interdisciplinary metrics in the literature and a lack of consensus about their utility [46]. Future work may consider a comparison of various interdisciplinary metrics existing in the literature, including their efficacy for downstream tasks such as understanding and predicting knowledge evolution as we do here. Likewise, there are many clustering algorithms that may be engaged and compared for this task. These too should be compared in context. Here again, a critical challenge is lack of benchmark datasets for evaluation.

Limitations

There exist limitations of this work inherent to both our dataset and modeling approach. The current dataset does not encompass fields of study that are closely related to yet distinct from the disciplines chosen. Consequently, publications that integrate the concepts from both these related but distinct fields may be labelled as weak members by this study, even if they belong to a knowledge cluster when publications from the closely related field are incorporated. Another limitation of the study is our use of one year time steps for identifying semantic clusters. Accordingly, we do not capture dynamics with evolutionary cycles less than one year. Additionally, we have used the purposeful selection method, that relies on statistical significance, to choose the variables which limits the scope of this study [75]. Future work aimed at providing a complete picture of the phenomena presented in this paper should evaluate relative contributions of the complete set of variables considered.

We highlight that the approach we present here is intentionally broad and conceptual. However, significant improvements in predictive performance may not be possible with a one-size-fits-all modelling approach across the scientific corpus. Rather, birth and death in one field may be catalyzed by meaningfully different factors in one field vs. another. Accordingly, future work may dig into field-specific modeling in cooperation with domain experts.

Conclusion

This paper proposes a hybrid language- and network-based framework that uses semantic embeddings and citation information to model metastability of ideas in order to identify dynamic events associated with the rise, fall, combination, and dispersion of topics in the scholarly corpus. We show that this hybrid approach is distinct from approaches based on linguistic or citation information alone. The methods we propose rely on multiple views of interdisciplinarity as predictors of scientific knowledge transitions. Our work lays groundwork for novel approaches that bring together linguistic and citation modeling for understanding dynamics in scientific literature.

Supporting information

S1 Appendix.

https://doi.org/10.1371/journal.pone.0287226.s001

(PDF)

S1 Table. Initial multivariate logistic regression results.

Results describing associations with knowledge evolution, binarized as split or merge (1) and continuation or death (0), and all our exogenous variables. Note significant positive associations with language score and number of weak members, plus a negative association for the year. Mean network interdisciplinarity score (considering strong members only) is statistically insignificant, as is the number of weak members. Mean network interdisciplinarity score (considering weak members only) is insignificant, yet it has a significant marginal effect. Per common practice, we purposefully re-ran our analysis discarding insignificant variables, to evaluate significance of network score among weak members and confirm our findings on language score, number of weak members, and year.

https://doi.org/10.1371/journal.pone.0287226.s002

(PDF)

S2 Table. Summary of results from a brief survey of three domain experts.

https://doi.org/10.1371/journal.pone.0287226.s003

(PDF)

S1 Fig. Impact of test-train split size on F1 Score.

Plot showing the best F1 scores achieved on the test data when random forest model trained by varying the sizes of training data on the events occurring in the following year. The labels shown next to each data point represent the model parameters (maximum depth, number of classifiers). When trained on events occurring between 2011 and 2014, every model in the grid search overfit to the training data.

https://doi.org/10.1371/journal.pone.0287226.s004

(TIF)

S2 Fig. Parameter grid search: F1 score variation.

Line plot showing the random forest model F1 scores achieved when events in 2011-17 were used for training and tested on predicting 2018 events for a sample of model parameters (maximum depth, number of classifiers) from grid search. The plot shows that the maximum depth of classifier trees has a higher effect on generalizability of the model than number of classifiers. The model performance on held out data diverges from the performance on training data due to the overfitting as maximum depth increases.

https://doi.org/10.1371/journal.pone.0287226.s005

(TIF)

S3 Fig. Figure showing the impact of similarity threshold value on fraction of different event groups identified.

https://doi.org/10.1371/journal.pone.0287226.s006

(TIF)

S4 Fig. Figure showing the impact of similarity threshold value on prediction task performance.

https://doi.org/10.1371/journal.pone.0287226.s007

(TIF)

Acknowledgments

The authors would like to thank Dr. Ashley Arigoni for her work on cluster comparison visualizations, as well as Mr. Joe Gorney and Mr. Alex Wade of Semantic Scholar for their aid in troubleshooting data engineering issues, and Dr. Ilya Rahkovsky of the Center for Security and Emerging Technology at Georgetown University and Dr. Phoebe Wong of Quantitative Scientific Solutions for their insights on the final analyses and drafts of this paper. Additionally, we thank the anonymous reviewers for their valuable feedback which strengthened this work.

References

1. Morris C. The significance of the unity of science movement. Philosophy and Phenomenological Research. 1946;6(4):508–515.
- View Article
- Google Scholar
2. Wang D, Song C, Barabási AL. Quantifying long-term scientific impact. Science. 2013;342(6154):127–132. pmid:24092745
- View Article
- PubMed/NCBI
- Google Scholar
3. Sinatra R, Wang D, Deville P, Song C, Barabási AL. Quantifying the evolution of individual scientific impact. Science. 2016;354 (6312). pmid:27811240
- View Article
- PubMed/NCBI
- Google Scholar
4. Liu L, Wang Y, Sinatra R, Giles CL, Song C, Wang D. Hot streaks in artistic, cultural, and scientific careers. Nature. 2018;559(7714):396–399. pmid:29995850
- View Article
- PubMed/NCBI
- Google Scholar
5. Li J, Yin Y, Fortunato S, Wang D. Scientific elite revisited: patterns of productivity, collaboration, authorship and impact. Journal of the Royal Society Interface. 2020;17(165):20200135. pmid:32316884
- View Article
- PubMed/NCBI
- Google Scholar
6. Pluchino A, Burgio G, Rapisarda A, Biondo AE, Pulvirenti A, Ferro A, et al. Exploring the role of interdisciplinarity in physics: success, talent and luck. PloS one. 2019;14(6):e0218793. pmid:31242227
- View Article
- PubMed/NCBI
- Google Scholar
7. Janosov M, Battiston F, Sinatra R. Success and luck in creative careers. EPJ Data Science. 2020;9(1):1–12.
- View Article
- Google Scholar
8. Peterson D, Panofsky A. Metascience as a scientific social movement. SocArXiv. 2020;.
9. Schooler JW. Metascience could rescue the ‘replication crisis’. Nature News. 2014;515(7525):9. pmid:25373639
- View Article
- PubMed/NCBI
- Google Scholar
10. Larivière V, Ni C, Gingras Y, Cronin B, Sugimoto CR. Bibliometrics: Global gender disparities in science. Nature News. 2013;504(7479):211. pmid:24350369
- View Article
- PubMed/NCBI
- Google Scholar
11. Hofstra B, Kulkarni VV, Galvez SMN, He B, Jurafsky D, McFarland DA. The diversity–innovation paradox in science. Proceedings of the National Academy of Sciences. 2020;117(17):9284–9291. pmid:32291335
- View Article
- PubMed/NCBI
- Google Scholar
12. Franco A, Malhotra N, Simonovits G. Publication bias in the social sciences: Unlocking the file drawer. Science. 2014;345(6203):1502–1505. pmid:25170047
- View Article
- PubMed/NCBI
- Google Scholar
13. Rzhetsky A, Foster JG, Foster IT, Evans JA. Choosing experiments to accelerate collective discovery. Proceedings of the National Academy of Sciences. 2015;112(47):14569–14574. pmid:26554009
- View Article
- PubMed/NCBI
- Google Scholar
14. Jia T, Wang D, Szymanski BK. Quantifying patterns of research-interest evolution. Nature Human Behaviour. 2017;1(4):1–7.
- View Article
- Google Scholar
15. Spangler S, Wilkins AD, Bachman BJ, Nagarajan M, Dayaram T, Haas P, et al. Automated hypothesis generation based on mining scientific literature. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining; 2014. p. 1877–1886.
16. Prabhakaran V, Hamilton WL, McFarland D, Jurafsky D. Predicting the rise and fall of scientific topics from trends in their rhetorical framing. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2016. p. 1170–1180.
17. Chen C, Wang Z, Li W, Sun X. Modeling scientific influence for research trending topic prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32; 2018.
18. Kuhn TS. The structure of scientific revolutions. vol. 111. Chicago University of Chicago Press; 1970.
19. Alexander JC. Paradigm revision and “parsonianism”. Canadian Journal of Sociology/Cahiers canadiens de sociologie. 1979; p. 343–358.
- View Article
- Google Scholar
20. Coccia M. General properties of the evolution of research fields: a scientometric study of human microbiome, evolutionary robotics and astrobiology. Scientometrics. 2018;117(2):1265–1283.
- View Article
- Google Scholar
21. Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, et al. Science of science. Science. 2018;359 (6379).
22. Klemiński R, Kazienko P. Identifying Promising Research Topics in Computer Science. In: European Network Intelligence Conference. Springer; 2017. p. 231–241.
23. Uban AS, Caragea C, Dinu LP. Studying the Evolution of Scientific Topics and their Relationships. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021; 2021. p. 1908–1922.
24. Faust O. Documenting and predicting topic changes in Computers in Biology and Medicine: A bibliometric keyword analysis from 1990 to 2017. Informatics in Medicine Unlocked. 2018;11:15–27.
- View Article
- Google Scholar
25. Shibata N, Kajikawa Y, Takeda Y, Matsushima K. Detecting emerging research fronts based on topological measures in citation networks of scientific publications. Technovation. 2008;28(11):758–775.
- View Article
- Google Scholar
26. Salatino AA, Osborne F, Motta E. AUGUR: forecasting the emergence of new research topics. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries; 2018. p. 303–312.
27. Sun Y, Latora V. The evolution of knowledge within and across fields in modern physics. Scientific reports. 2020;10(1):1–9. pmid:32694516
- View Article
- PubMed/NCBI
- Google Scholar
28. Zamani M, Tejedor A, Vogl M, Kräutli F, Valleriani M, Kantz H. Evolution and transformation of early modern cosmological knowledge: A network study. Scientific Reports. 2020;10(1):1–15. pmid:33188234
- View Article
- PubMed/NCBI
- Google Scholar
29. Sarigöl E, Pfitzner R, Scholtes I, Garas A, Schweitzer F. Predicting scientific success based on coauthorship networks. EPJ Data Science. 2014;3:1–16.
- View Article
- Google Scholar
30. Sun X, Ding K, Lin Y. Mapping the evolution of scientific fields based on cross-field authors. Journal of Informetrics. 2016;10(3):750–761.
- View Article
- Google Scholar
31. García-Pérez MA. Accuracy and completeness of publication and citation records in the Web of Science, PsycINFO, and Google Scholar: A case study for the computation of h indices in Psychology. Journal of the American society for information science and technology. 2010;61(10):2070–2085.
- View Article
- Google Scholar
32. Pavlovic V, Weissgerber T, Stanisavljevic D, Pekmezovic T, Milicevic O, Lazovic JM, et al. How accurate are citations of frequently cited papers in biomedical literature? Clinical Science. 2021;135(5):671–681. pmid:33599711
- View Article
- PubMed/NCBI
- Google Scholar
33. Slyder JB, Stein BR, Sams BS, Walker DM, Jacob Beale B, Feldhaus JJ, et al. Citation pattern and lifespan: a comparison of discipline, institution, and individual. Scientometrics. 2011;89(3):955–966.
- View Article
- Google Scholar
34. Schoonbaert D, Roelants G. Citation analysis for measuring the value of scientific publications: quality assessment tool or comedy of errors? Tropical Medicine & International Health. 1996;1(6):739–752. pmid:8980585
- View Article
- PubMed/NCBI
- Google Scholar
35. Sasaki H, Fugetsu B, Sakata I. Emerging Scientific Field Detection Using Citation Networks and Topic Models—A Case Study of the Nanocarbon Field. Applied System Innovation. 2020;3(3):40.
- View Article
- Google Scholar
36. Zhang Y, Chen H, Lu J, Zhang G. Detecting and predicting the topic change of Knowledge-based Systems: A topic-based bibliometric analysis from 1991 to 2016. Knowledge-Based Systems. 2017;133:255–268.
- View Article
- Google Scholar
37. Cohan A, Feldman S, Beltagy I, Downey D, Weld DS. Specter: Document-level representation learning using citation-informed transformers. arXiv preprint arXiv:200407180. 2020;.
38. Klein JT. Interdisciplinarity: History, theory, and practice. Wayne state university press; 1990.
39. Jacobs JA, Frickel S. Interdisciplinarity: A critical assessment. Annual review of Sociology. 2009;35:43–65.
- View Article
- Google Scholar
40. Repko AF, Szostak R, Buchberger MP. Introduction to interdisciplinary studies. Sage Publications; 2019.
41. Pan RK, Sinha S, Kaski K, Saramäki J. The evolution of interdisciplinarity in physics research. Scientific reports. 2012;2(1):1–8. pmid:22870380
- View Article
- PubMed/NCBI
- Google Scholar
42. Molas-Gallart J, Rafols I, Tang P. On the Relationship between Interdisciplinarity and Impact: Different modalities of interdisciplinarity lead to different types of impact (< SPECIAL REPORT> TOWARD INTERDISCIPLINARITY IN RESEARCH AND DEVELOPMENT). The Journal of Science Policy and Research Management. 2014;29(2_3):69–89.
- View Article
- Google Scholar
43. Jacobs JA. In defense of disciplines. University of chicago Press; 2014.
44. Wagner CS, Roessner JD, Bobb K, Klein JT, Boyack KW, Keyton J, et al. Approaches to understanding and measuring interdisciplinary scientific research (IDR): A review of the literature. Journal of informetrics. 2011;5(1):14–26.
- View Article
- Google Scholar
45. Porter A, Rafols I. Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics. 2009;81(3):719–745.
- View Article
- Google Scholar
46. Wang Q, Schneider JW. Consistency and validity of interdisciplinarity measures. Quantitative Science Studies. 2020;1(1):239–263.
- View Article
- Google Scholar
47. Analytics C. Web of science; 2021.
48. Porter A, Chubin D. An indicator of cross-disciplinary research. Scientometrics. 1985;8(3-4):161–176.
- View Article
- Google Scholar
49. Morillo F, Bordons M, Gómez I. An approach to interdisciplinarity through bibliometric indicators. Scientometrics. 2001;51(1):203–222.
- View Article
- Google Scholar
50. Wang J, Thijs B, Glänzel W. Interdisciplinarity and impact: Distinct effects of variety, balance, and disparity. PloS one. 2015;10(5):e0127298. pmid:26001108
- View Article
- PubMed/NCBI
- Google Scholar
51. Leydesdorff L. Betweenness centrality as an indicator of the interdisciplinarity of scientific journals. Journal of the American Society for Information Science and Technology. 2007;58(9):1303–1319.
- View Article
- Google Scholar
52. Leydesdorff L, Rafols I. Indicators of the interdisciplinarity of journals: Diversity, centrality, and citations. Journal of Informetrics. 2011;5(1):87–100.
- View Article
- Google Scholar
53. Gao Q, Liang Z, Wang P, Hou J, Chen X, Liu M. Potential index: Revealing the future impact of research topics based on current knowledge networks. Journal of Informetrics. 2021;15(3):101165.
- View Article
- Google Scholar
54. Chen C, Chen Y, Horowitz M, Hou H, Liu Z, Pellegrino D. Towards an explanatory and computational theory of scientific discovery. Journal of Informetrics. 2009;3(3):191–209.
- View Article
- Google Scholar
55. Alipourfard N, Arendt B, Benjamin DJ, Benkler N, Bishop M, Burstein M, et al. Systematizing Confidence in Open Research and Evidence (SCORE). SocArXiv. 2021;.
56. Ammar W, Groeneveld D, Bhagavatula C, Beltagy I, Crawford M, Downey D, et al. Construction of the literature graph in semantic scholar. arXiv preprint arXiv:180502262. 2018;.
57. Lo K, Wang LL, Neumann M, Kinney R, Weld DS. S2ORC: The semantic scholar open research corpus. arXiv preprint arXiv:191102782. 2019;.
58. Lammey R. CrossRef text and data mining services. Insights. 2015;28(2).
- View Article
- Google Scholar
59. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;.
60. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. Transformers: State-of-the-Art Natural Language Processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics; 2020. p. 38–45. Available from: https://www.aclweb.org/anthology/2020.emnlp-demos.6.
61. McInnes L, Healy J, Astels S. hdbscan: Hierarchical density based clustering. Journal of Open Source Software. 2017;2(11):205.
- View Article
- Google Scholar
62. McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426. 2018;.
63. Turner R, Eriksson D, McCourt M, Kiili J, Laaksonen E, Xu Z, et al. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. In: NeurIPS 2020 Competition and Demonstration Track. PMLR; 2021. p. 3–26.
64. Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A Next-generation Hyperparameter Optimization Framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2019.
65. Moulavi D, Jaskowiak PA, Campello RJ, Zimek A, Sander J. Density-based clustering validation. In: Proceedings of the 2014 SIAM international conference on data mining. SIAM; 2014. p. 839–847.
66. Keele L, McConnaughy C, White I. Strengthening the experimenter’s toolbox: Statistical estimation of internal validity. American Journal of Political Science. 2012;56(2):484–499.
- View Article
- Google Scholar
67. Grootendorst M. KeyBERT: Minimal keyword extraction with BERT.; 2020. Available from: https://doi.org/10.5281/zenodo.4461265.
68. Lu H, Halappanavar M, Kalyanaraman A. Parallel heuristics for scalable community detection. Parallel Computing. 2015;47:19–37.
- View Article
- Google Scholar
69. Lancichinetti A, Fortunato S. Community detection algorithms: a comparative analysis. Physical review E. 2009;80(5):056117. pmid:20365053
- View Article
- PubMed/NCBI
- Google Scholar
70. Rafols I, Meyer M. Diversity measures and network centralities as indicators of interdisciplinarity: case studies in bionanoscience. In: Proceedings of ISSI. vol. 2; 2007. p. 631–637.
71. Greene D, Doyle D, Cunningham P. Tracking the evolution of communities in dynamic social networks. In: 2010 international conference on advances in social networks analysis and mining. IEEE; 2010. p. 176–183.
72. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830.
- View Article
- Google Scholar
73. Asur S, Parthasarathy S, Ucar D. An event-based framework for characterizing the evolutionary behavior of interaction graphs. ACM Transactions on Knowledge Discovery from Data (TKDD). 2009;3(4):1–36.
- View Article
- Google Scholar
74. Strobl C, Boulesteix AL, Zeileis A, Hothorn T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC bioinformatics. 2007;8(1):1–21. pmid:17254353
- View Article
- PubMed/NCBI
- Google Scholar
75. Wasserstein RL, Lazar NA. The ASA statement on p-values: context, process, and purpose; 2016.

[ref1] 1. Morris C. The significance of the unity of science movement. Philosophy and Phenomenological Research. 1946;6(4):508–515.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Wang D, Song C, Barabási AL. Quantifying long-term scientific impact. Science. 2013;342(6154):127–132. pmid:24092745
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Sinatra R, Wang D, Deville P, Song C, Barabási AL. Quantifying the evolution of individual scientific impact. Science. 2016;354 (6312). pmid:27811240
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Liu L, Wang Y, Sinatra R, Giles CL, Song C, Wang D. Hot streaks in artistic, cultural, and scientific careers. Nature. 2018;559(7714):396–399. pmid:29995850
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref5] 5. Li J, Yin Y, Fortunato S, Wang D. Scientific elite revisited: patterns of productivity, collaboration, authorship and impact. Journal of the Royal Society Interface. 2020;17(165):20200135. pmid:32316884
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref6] 6. Pluchino A, Burgio G, Rapisarda A, Biondo AE, Pulvirenti A, Ferro A, et al. Exploring the role of interdisciplinarity in physics: success, talent and luck. PloS one. 2019;14(6):e0218793. pmid:31242227
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref7] 7. Janosov M, Battiston F, Sinatra R. Success and luck in creative careers. EPJ Data Science. 2020;9(1):1–12.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref8] 8. Peterson D, Panofsky A. Metascience as a scientific social movement. SocArXiv. 2020;.

[ref9] 9. Schooler JW. Metascience could rescue the ‘replication crisis’. Nature News. 2014;515(7525):9. pmid:25373639
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref10] 10. Larivière V, Ni C, Gingras Y, Cronin B, Sugimoto CR. Bibliometrics: Global gender disparities in science. Nature News. 2013;504(7479):211. pmid:24350369
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref11] 11. Hofstra B, Kulkarni VV, Galvez SMN, He B, Jurafsky D, McFarland DA. The diversity–innovation paradox in science. Proceedings of the National Academy of Sciences. 2020;117(17):9284–9291. pmid:32291335
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref12] 12. Franco A, Malhotra N, Simonovits G. Publication bias in the social sciences: Unlocking the file drawer. Science. 2014;345(6203):1502–1505. pmid:25170047
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref13] 13. Rzhetsky A, Foster JG, Foster IT, Evans JA. Choosing experiments to accelerate collective discovery. Proceedings of the National Academy of Sciences. 2015;112(47):14569–14574. pmid:26554009
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref14] 14. Jia T, Wang D, Szymanski BK. Quantifying patterns of research-interest evolution. Nature Human Behaviour. 2017;1(4):1–7.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref15] 15. Spangler S, Wilkins AD, Bachman BJ, Nagarajan M, Dayaram T, Haas P, et al. Automated hypothesis generation based on mining scientific literature. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining; 2014. p. 1877–1886.

[ref16] 16. Prabhakaran V, Hamilton WL, McFarland D, Jurafsky D. Predicting the rise and fall of scientific topics from trends in their rhetorical framing. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2016. p. 1170–1180.

[ref17] 17. Chen C, Wang Z, Li W, Sun X. Modeling scientific influence for research trending topic prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32; 2018.

[ref18] 18. Kuhn TS. The structure of scientific revolutions. vol. 111. Chicago University of Chicago Press; 1970.

[ref19] 19. Alexander JC. Paradigm revision and “parsonianism”. Canadian Journal of Sociology/Cahiers canadiens de sociologie. 1979; p. 343–358.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Coccia M. General properties of the evolution of research fields: a scientometric study of human microbiome, evolutionary robotics and astrobiology. Scientometrics. 2018;117(2):1265–1283.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, et al. Science of science. Science. 2018;359 (6379).

[ref22] 22. Klemiński R, Kazienko P. Identifying Promising Research Topics in Computer Science. In: European Network Intelligence Conference. Springer; 2017. p. 231–241.

[ref23] 23. Uban AS, Caragea C, Dinu LP. Studying the Evolution of Scientific Topics and their Relationships. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021; 2021. p. 1908–1922.

[ref24] 24. Faust O. Documenting and predicting topic changes in Computers in Biology and Medicine: A bibliometric keyword analysis from 1990 to 2017. Informatics in Medicine Unlocked. 2018;11:15–27.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref25] 25. Shibata N, Kajikawa Y, Takeda Y, Matsushima K. Detecting emerging research fronts based on topological measures in citation networks of scientific publications. Technovation. 2008;28(11):758–775.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref26] 26. Salatino AA, Osborne F, Motta E. AUGUR: forecasting the emergence of new research topics. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries; 2018. p. 303–312.

[ref27] 27. Sun Y, Latora V. The evolution of knowledge within and across fields in modern physics. Scientific reports. 2020;10(1):1–9. pmid:32694516
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref28] 28. Zamani M, Tejedor A, Vogl M, Kräutli F, Valleriani M, Kantz H. Evolution and transformation of early modern cosmological knowledge: A network study. Scientific Reports. 2020;10(1):1–15. pmid:33188234
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref29] 29. Sarigöl E, Pfitzner R, Scholtes I, Garas A, Schweitzer F. Predicting scientific success based on coauthorship networks. EPJ Data Science. 2014;3:1–16.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref30] 30. Sun X, Ding K, Lin Y. Mapping the evolution of scientific fields based on cross-field authors. Journal of Informetrics. 2016;10(3):750–761.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref31] 31. García-Pérez MA. Accuracy and completeness of publication and citation records in the Web of Science, PsycINFO, and Google Scholar: A case study for the computation of h indices in Psychology. Journal of the American society for information science and technology. 2010;61(10):2070–2085.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref32] 32. Pavlovic V, Weissgerber T, Stanisavljevic D, Pekmezovic T, Milicevic O, Lazovic JM, et al. How accurate are citations of frequently cited papers in biomedical literature? Clinical Science. 2021;135(5):671–681. pmid:33599711
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref33] 33. Slyder JB, Stein BR, Sams BS, Walker DM, Jacob Beale B, Feldhaus JJ, et al. Citation pattern and lifespan: a comparison of discipline, institution, and individual. Scientometrics. 2011;89(3):955–966.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref34] 34. Schoonbaert D, Roelants G. Citation analysis for measuring the value of scientific publications: quality assessment tool or comedy of errors? Tropical Medicine & International Health. 1996;1(6):739–752. pmid:8980585
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref35] 35. Sasaki H, Fugetsu B, Sakata I. Emerging Scientific Field Detection Using Citation Networks and Topic Models—A Case Study of the Nanocarbon Field. Applied System Innovation. 2020;3(3):40.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref36] 36. Zhang Y, Chen H, Lu J, Zhang G. Detecting and predicting the topic change of Knowledge-based Systems: A topic-based bibliometric analysis from 1991 to 2016. Knowledge-Based Systems. 2017;133:255–268.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref37] 37. Cohan A, Feldman S, Beltagy I, Downey D, Weld DS. Specter: Document-level representation learning using citation-informed transformers. arXiv preprint arXiv:200407180. 2020;.

[ref38] 38. Klein JT. Interdisciplinarity: History, theory, and practice. Wayne state university press; 1990.

[ref39] 39. Jacobs JA, Frickel S. Interdisciplinarity: A critical assessment. Annual review of Sociology. 2009;35:43–65.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref40] 40. Repko AF, Szostak R, Buchberger MP. Introduction to interdisciplinary studies. Sage Publications; 2019.

[ref41] 41. Pan RK, Sinha S, Kaski K, Saramäki J. The evolution of interdisciplinarity in physics research. Scientific reports. 2012;2(1):1–8. pmid:22870380
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref42] 42. Molas-Gallart J, Rafols I, Tang P. On the Relationship between Interdisciplinarity and Impact: Different modalities of interdisciplinarity lead to different types of impact (< SPECIAL REPORT> TOWARD INTERDISCIPLINARITY IN RESEARCH AND DEVELOPMENT). The Journal of Science Policy and Research Management. 2014;29(2_3):69–89.
View Article
Google Scholar

[116] View Article

[117] Google Scholar

[ref43] 43. Jacobs JA. In defense of disciplines. University of chicago Press; 2014.

[ref44] 44. Wagner CS, Roessner JD, Bobb K, Klein JT, Boyack KW, Keyton J, et al. Approaches to understanding and measuring interdisciplinary scientific research (IDR): A review of the literature. Journal of informetrics. 2011;5(1):14–26.
View Article
Google Scholar

[120] View Article

[121] Google Scholar

[ref45] 45. Porter A, Rafols I. Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics. 2009;81(3):719–745.
View Article
Google Scholar

[123] View Article

[124] Google Scholar

[ref46] 46. Wang Q, Schneider JW. Consistency and validity of interdisciplinarity measures. Quantitative Science Studies. 2020;1(1):239–263.
View Article
Google Scholar

[126] View Article

[127] Google Scholar

[ref47] 47. Analytics C. Web of science; 2021.

[ref48] 48. Porter A, Chubin D. An indicator of cross-disciplinary research. Scientometrics. 1985;8(3-4):161–176.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref49] 49. Morillo F, Bordons M, Gómez I. An approach to interdisciplinarity through bibliometric indicators. Scientometrics. 2001;51(1):203–222.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref50] 50. Wang J, Thijs B, Glänzel W. Interdisciplinarity and impact: Distinct effects of variety, balance, and disparity. PloS one. 2015;10(5):e0127298. pmid:26001108
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

[ref51] 51. Leydesdorff L. Betweenness centrality as an indicator of the interdisciplinarity of scientific journals. Journal of the American Society for Information Science and Technology. 2007;58(9):1303–1319.
View Article
Google Scholar

[140] View Article

[141] Google Scholar

[ref52] 52. Leydesdorff L, Rafols I. Indicators of the interdisciplinarity of journals: Diversity, centrality, and citations. Journal of Informetrics. 2011;5(1):87–100.
View Article
Google Scholar

[143] View Article

[144] Google Scholar

[ref53] 53. Gao Q, Liang Z, Wang P, Hou J, Chen X, Liu M. Potential index: Revealing the future impact of research topics based on current knowledge networks. Journal of Informetrics. 2021;15(3):101165.
View Article
Google Scholar

[146] View Article

[147] Google Scholar

[ref54] 54. Chen C, Chen Y, Horowitz M, Hou H, Liu Z, Pellegrino D. Towards an explanatory and computational theory of scientific discovery. Journal of Informetrics. 2009;3(3):191–209.
View Article
Google Scholar

[149] View Article

[150] Google Scholar

[ref55] 55. Alipourfard N, Arendt B, Benjamin DJ, Benkler N, Bishop M, Burstein M, et al. Systematizing Confidence in Open Research and Evidence (SCORE). SocArXiv. 2021;.

[ref56] 56. Ammar W, Groeneveld D, Bhagavatula C, Beltagy I, Crawford M, Downey D, et al. Construction of the literature graph in semantic scholar. arXiv preprint arXiv:180502262. 2018;.

[ref57] 57. Lo K, Wang LL, Neumann M, Kinney R, Weld DS. S2ORC: The semantic scholar open research corpus. arXiv preprint arXiv:191102782. 2019;.

[ref58] 58. Lammey R. CrossRef text and data mining services. Insights. 2015;28(2).
View Article
Google Scholar

[155] View Article

[156] Google Scholar

[ref59] 59. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;.

[ref60] 60. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. Transformers: State-of-the-Art Natural Language Processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics; 2020. p. 38–45. Available from: https://www.aclweb.org/anthology/2020.emnlp-demos.6.

[ref61] 61. McInnes L, Healy J, Astels S. hdbscan: Hierarchical density based clustering. Journal of Open Source Software. 2017;2(11):205.
View Article
Google Scholar

[160] View Article

[161] Google Scholar

[ref62] 62. McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426. 2018;.

[ref63] 63. Turner R, Eriksson D, McCourt M, Kiili J, Laaksonen E, Xu Z, et al. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. In: NeurIPS 2020 Competition and Demonstration Track. PMLR; 2021. p. 3–26.

[ref64] 64. Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A Next-generation Hyperparameter Optimization Framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2019.

[ref65] 65. Moulavi D, Jaskowiak PA, Campello RJ, Zimek A, Sander J. Density-based clustering validation. In: Proceedings of the 2014 SIAM international conference on data mining. SIAM; 2014. p. 839–847.

[ref66] 66. Keele L, McConnaughy C, White I. Strengthening the experimenter’s toolbox: Statistical estimation of internal validity. American Journal of Political Science. 2012;56(2):484–499.
View Article
Google Scholar

[167] View Article

[168] Google Scholar

[ref67] 67. Grootendorst M. KeyBERT: Minimal keyword extraction with BERT.; 2020. Available from: https://doi.org/10.5281/zenodo.4461265.

[ref68] 68. Lu H, Halappanavar M, Kalyanaraman A. Parallel heuristics for scalable community detection. Parallel Computing. 2015;47:19–37.
View Article
Google Scholar

[171] View Article

[172] Google Scholar

[ref69] 69. Lancichinetti A, Fortunato S. Community detection algorithms: a comparative analysis. Physical review E. 2009;80(5):056117. pmid:20365053
View Article
PubMed/NCBI
Google Scholar

[174] View Article

[175] PubMed/NCBI

[176] Google Scholar

[ref70] 70. Rafols I, Meyer M. Diversity measures and network centralities as indicators of interdisciplinarity: case studies in bionanoscience. In: Proceedings of ISSI. vol. 2; 2007. p. 631–637.

[ref71] 71. Greene D, Doyle D, Cunningham P. Tracking the evolution of communities in dynamic social networks. In: 2010 international conference on advances in social networks analysis and mining. IEEE; 2010. p. 176–183.

[ref72] 72. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830.
View Article
Google Scholar

[180] View Article

[181] Google Scholar

[ref73] 73. Asur S, Parthasarathy S, Ucar D. An event-based framework for characterizing the evolutionary behavior of interaction graphs. ACM Transactions on Knowledge Discovery from Data (TKDD). 2009;3(4):1–36.
View Article
Google Scholar

[183] View Article

[184] Google Scholar

[ref74] 74. Strobl C, Boulesteix AL, Zeileis A, Hothorn T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC bioinformatics. 2007;8(1):1–21. pmid:17254353
View Article
PubMed/NCBI
Google Scholar

[186] View Article

[187] PubMed/NCBI

[188] Google Scholar

[ref75] 75. Wasserstein RL, Lazar NA. The ASA statement on p-values: context, process, and purpose; 2016.

Figures

Abstract

Introduction

Materials and methods

Dataset

Methods

SPECTER-based topic modeling

Citation graphs and communities

Quantifying interdisciplinarity

Language-based interdisciplinarity.

Citation-based interdisciplinarity.

Text-based dynamic event modeling

Event-tracking and prediction

Results

Comparing clusters and communities suggests valuable incomplete overlap

Illustrating knowledge evolution events

Knowledge evolution is significantly associated with interdisciplinarity and weak members

Validating our statistical result with predictive power—equal importance of interdisciplinarity scores

Discussion

Limitations

Conclusion

Supporting information

S1 Appendix.

S1 Table. Initial multivariate logistic regression results.

S2 Table. Summary of results from a brief survey of three domain experts.

S1 Fig. Impact of test-train split size on F1 Score.

S2 Fig. Parameter grid search: F1 score variation.

S3 Fig. Figure showing the impact of similarity threshold value on fraction of different event groups identified.

S4 Fig. Figure showing the impact of similarity threshold value on prediction task performance.

Acknowledgments

References