Knowledge transfer, knowledge gaps, and knowledge silos in citation networks

Eoghan Cunningham; Derek Greene

doi:10.1371/journal.pone.0329302

Abstract

The advancement of science relies on the exchange of ideas across disciplines and the integration of diverse knowledge domains. However, tracking knowledge flows and interdisciplinary integration in rapidly evolving, multidisciplinary fields remains a significant challenge. This work introduces a novel network analysis framework to study the dynamics of knowledge transfer directly from citation data. By applying dynamic community detection to cumulative, time-evolving citation networks, we can identify research areas as groups of papers sharing knowledge sources and outputs. Our analysis characterises the life-cycles and knowledge transfer patterns of these dynamic communities over time. We demonstrate our approach through a case study of eXplainable Artificial Intelligence (XAI) research, an emerging interdisciplinary field at the intersection of machine learning, statistics, and psychology. Key findings include: (i) knowledge transfer between these important foundational topics and the contemporary topics in XAI research is limited, and the extent of knowledge transfer varies across different contemporary research topics; (ii) certain application domains exist as isolated “knowledge silos”; (iii) significant “knowledge gaps” are identified between related XAI research areas, suggesting opportunities for cross-pollination and improved knowledge integration. By mapping interdisciplinary integration and bridging knowledge gaps, this work can inform strategies to synthesise ideas from disparate sources and drive innovation. More broadly, our proposed framework enables new insights into the evolution of knowledge ecosystems directly from citation data, with applications spanning literature review, research planning, and science policy.

Citation: Cunningham E, Greene D (2025) Knowledge transfer, knowledge gaps, and knowledge silos in citation networks. PLoS One 20(8): e0329302. https://doi.org/10.1371/journal.pone.0329302

Editor: Aamna AlShehhi, Khalifa University, UNITED ARAB EMIRATES

Received: June 6, 2024; Accepted: July 14, 2025; Published: August 1, 2025

Copyright: © 2025 Cunningham, Greene. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All files are available from GitHub at https://github.com/eoghancunn/knowledge_transfer_in_xai.

Funding: This research was conducted with the financial support of Science Foundation Ireland under Grant Number 12/RC/2289 P2 at the Insight SFI Research Centre for Data Analytics. DG received funding from Science Foundation Ireland https://www.sfi.ie/ SFI played no role in study design The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The advancement of science is driven by the exchange of ideas across disciplines and the integration of diverse knowledge domains [1]. Understanding the evolution of research fields and the transfer of knowledge between them is crucial for effective interdisciplinary collaboration and scientific progress. Interdisciplinary research, which integrates methods and expertise from different domains, is highly valued for its potential to drive innovation and impact [2]. However, tracking the development of rapidly evolving, multidisciplinary fields poses significant challenges. Top-down, static taxonomies often fail to capture the dynamic nature of these research areas which are hierarchical, overlapping, and evolving rapidly. Delineations of research areas should be made in a bottom-up, data-driven manner that is tailored to the scale and scope of the intended analysis. However, existing bottom-up content-based methods, such as topic modelling, often ignore the knowledge transfer encoded by citation relationships. As a result, these methods may miss the broader implications of an article, especially in cases where connections are not immediately apparent due to the absence of shared terminology or explicit associations.

In this work, we develop novel methods to analyse knowledge transfer in rapidly evolving, highly interdisciplinary research fields. Our approach aims to uncover knowledge gaps and knowledge silos [3], where effective knowledge transfer is not taking place. Rather than relying on prescribed discipline taxonomies or traditional topic models, we leverage dynamic community detection techniques from social network analysis [4] to identify and track the evolution of research topics directly from the citation network. Citation networks offer a unique perspective on knowledge transfer, as they represent the flow of information and ideas through the literature. Many methods have been developed for detecting and studying network communities – groups of nodes in the network where nodes in the same community are densely connected to one another, with relatively few connections to nodes in other groups – as they are key tasks in network analysis. By extending such methods to citation networks and thus grouping papers into dynamic communities based on their shared knowledge sources and outputs, we can delimit research areas as they naturally emerge and evolve over time. Such research areas, as determined by community finding, differ from ‘research topics’, as determined by topic modelling or other natural language processing (NLP) techniques (see Related Work for a discussion of such methods). For example, two distinct research areas that exhibit limited knowledge transfer may pertain to the same research topic. Conversely, given sufficient knowledge transfer between two research topics, they may be identified as a single research area. Accordingly, research paper text remains a necessary component of citation network analysis [5, 6]. In our work, we leverage research paper text content for the validation and interpretation of the dynamic communities. By analysing the interactions between these communities, we can gain insights into the nature of knowledge transfer in a body of research and how this process evolve over time. Specifically, we measure the extent to which contemporary topics build upon foundational research, we identify isolated knowledge silos, and uncover significant knowledge gaps between related research areas.

A key methodological contribution of this work is the new perspective we provide on dynamic community finding algorithms to facilitate their application to the unique context of mapping knowledge transfer in citation networks, which exhibit cumulative growth and content-rich nodes. We propose dynamic community finding methods specific to the study of knowledge transfer in citation networks. In our methods we consider the unique temporal properties, and rich paper content inherent to cumulative citation networks to identify and characterise the life-cycles, content coherence, centrality, and other knowledge transfer patterns of dynamic communities (or research areas) over time. We demonstrate the utility of our methods by investigating the following research questions relating to knowledge transfer in any rapidly evolving, interdisciplinary research field.

How do contemporary topics in the field rely on foundational research?
What are the research areas that are most isolated in terms of knowledge transfer within the literature?
Is there evidence of knowledge gaps between otherwise related research topics?

We choose eXplainable Artificial Intelligence (XAI) as a case study, as it represents a highly interdisciplinary research area that draws on concepts from a diverse set of foundational topics and that has important implications and applications across many fields of study [7]. In the context of this application, the three research questions above take the following forms.

(i) How do contemporary topics in XAI rely on foundational research in psychology, statistics, and computer science? Despite the recent rapid growth of explainable artificial intelligence research, the field has its roots in topics such as psychology and cognitive science (the psychology of explanation), computer science and statistics (model interpretability), and political science and social science (ethics, governance and accountability). It is pertinent to understand the extent to which current XAI research leverages and builds on these studies.

(ii) What are the most isolated research areas in XAI in terms of knowledge transfer? The field of XAI is highly multidisciplinary – methods and concepts from many research disciplines are represented. We aim to identify the most isolated research areas in the literature as potential “knowledge silos”, which exhibit minimal knowledge transfer with other research areas (foundational or contemporary).

(iii) What are the most significant knowledge gaps in the XAI literature? Given the accelerated rate of publication of XAI-related research, it may be challenging for authors to keep abreast of the research outputs that relate to their own. As such, short sighted reading and citation patterns can develop which could lead to knowledge gaps. We seek to identify knowledge gaps by modelling the probability of knowledge transfer between research areas according to their content similarity and citation neighbourhoods. Thus, pairs of research areas that exhibit substantially less knowledge transfer than predicted are concluded to have knowledge gaps.

By mapping the knowledge flows, knowledge gaps, and potential silos in this evolving field, our work can inform research planning, collaboration strategies, and knowledge integration efforts in XAI and other complex multidisciplinary domains. Overall, this work provides a powerful network analysis framework to study the dynamics of knowledge transfer and integration in rapidly evolving interdisciplinary fields directly from the published literature. The methodological contributions enable new insights into research ecosystems, while the XAI case study demonstrates the utility and real-world relevance of the approach.

Related work

Mapping the structure and evolution of scientific fields has been an active area of research. Prescribed taxonomies and discipline classifications have been widely used to categorise papers into broader subjects or research areas [8–10]. For example, Microsoft Academic and Web of Science maintain large, subject or ‘field-of-study’ classifications for articles that are readily available and provided at multiple levels of detail. However, such top-down, static, taxonomies struggle to accurately capture the organic evolution of research topics [11–13], as disciplinary borders have been shown to be changing constantly [14]. Alternative, bottom-up methods have been developed to identify the constituent research topics in some larger corpus. In particular, keyword-based methods identify research topics as groups of commonly co-occurring keywords [15–18], while topic modelling techniques can be applied to research papers’ titles, abstracts, or full texts to group or connect papers that exhibit similar patterns of word usage [19, 20]. Such bottom-up methods have advantages over static, prescribed classifications as they can be adapted to the specific application/dataset. Some works have developed these methods to study the evolution of topics and keywords over time [10, 17, 21, 22]. However, topic modelling methods come with challenges related to model selection and validation [20]. Moreover, they fail to recognise the transfer of knowledge inherent in a corpus of scientific research, in particular: the knowledge transfer represented by citations relationships.

Network analysis approaches have emerged as powerful tools for mapping the landscape of scientific research directly from citation patterns [23–27]. Community detection algorithms aim to identify densely connected groups of nodes in networks, representing communities or clusters with strong internal connections and relatively fewer external connections [28]. These algorithms have found numerous applications in various domains, including social network analysis [18, 29, 30], biological networks [31], and network science [28]. By modelling scientific literature as networks of papers connected by citations, community detection methods can uncover the underlying research areas present as groups of related papers that share knowledge sources and/or knowledge outputs. As research fields evolve over time, the underlying network structures and community memberships change, necessitating the study of dynamic communities. Dynamic community detection algorithms have been developed to track the evolution of communities in social networks [4, 32]. Building on works in dynamic or ‘evolutionary’ clustering [33, 34], many dynamic community finding techniques partition the data into a series of temporal snapshots or time windows [4]. By matching communities identified in adjacent networks snapshots, dynamic community life-cycles are identified as sequences of matched snapshot communities [26, 32, 35]. Existing literature has proposed various metrics specific to dynamic communities, such as ‘stability’ [36], ‘stationarity’ [32], and ‘density’ [36], and developed methods for identifying and characterising life-cycle ‘events’, such as community births, deaths, merges, and splits.

Several studies have applied dynamic community analysis techniques to citation networks. Many of the earliest examples include dynamic citation networks as case-studies, where the specific focus of the work is to demonstrate the performance of their proposed community finding methods [14, 37]. Subsequent works have had a more explicit focus on mapping or predicting changes in research networks. For example, Chakraborty et al. [8] measure changes in various citation network metrics for different fields-of-study, as prescribed by the Microsoft Academic Graph. Similar works track changes in community metrics for fields identified in a bottom-up manner using community detection [26, 38], while others focus on predicting future changes [25]. Further examples of dynamic community finding in research networks build multi-partite graphs of papers, authors, concepts and/or venues, in order to uncover the social dynamics that define the formation of research areas [24, 39]. To date, there has been limited research that leverages dynamic community finding in citation networks to study knowledge transfer between disciplines.

Effective knowledge transfer and integration across different disciplines are crucial for addressing complex scientific challenges and driving innovation [1, 2]. However, interdisciplinary research often faces significant hurdles, such as insular reading and citation practices [40]. Such barriers can lead to the formation of knowledge gaps and silos, hindering scientific progress and productivity [41, 42]. Identifying and bridging these knowledge gaps is essential for fostering interdisciplinary collaboration and facilitating the cross-pollination of ideas and methods. For example, [43] identify research gaps as keyword pairs that co-occur significantly less often than predicted. Papers that span these gaps (i.e., contain both keywords) were found to have a high impact. Several studies have recognised the presence of knowledge silos and their detrimental effects on scientific research, emphasising the need for approaches to map and understand knowledge transfer dynamics [40–42]. However, limited research exists to date that studies knowledge transfer from the perspective of dynamic citation network analysis. Instead, works in this area typically rely on prescribed field of study or discipline labels [43, 44].

Karunan et al. [45] studied cross fertilisation between research areas through the study of ‘boundary papers’ – i.e., papers that are identified in more than one field. The authors defined fields at the time of data collection via a set of custom keywords chosen to relate to each research area. Similarly, Zhou et al., [44] identified research disciplines according to the journal in which they were published and characterised the knowledge flows between disciplines in terms of their ‘broadness’, ‘intensity’, and ‘homogeneity’.

In this work, we extend existing methods for finding dynamic communities to specifically map knowledge transfer and identify knowledge gaps in citation networks. Crucially, citation networks have unique characteristics, such as cumulative growth, where papers and citations are never removed from the network, and content-rich nodes, where papers include substantial textual information. These features necessitate novel approaches to community detection and tailored strategies for interpreting their outputs. The primary goal of many existing applications of dynamic community finding in citation networks is to benchmark the performance of proposed algorithms [37, 46], rather than to understand the real-world dynamics of interdisciplinary or multidisciplinary research interactions. Therefore, they often overlook the specific properties and nuances of cumulative citation networks and fail to explain how to interpret the resulting dynamic communities. Furthermore, while the textual metadata associated with papers has been highlighted as an important resource for citation network analysis [5], many instances of such analysis have been found to ignore the article content [6]. Leveraging this textual information can help validate and interpret the identified communities. Specifically, dynamic community metrics which consider the content of the papers in addition to community membership and network structure, could serve to bridge the gap between network-based methods of tracking research evolution and other NLP-based approaches (e.g. [10, 17, 21]).

Data

In this section, we discuss the construction of the dataset which is considered later in our case study. We chose eXplainable Artificial Intelligence (XAI) research as a case study of a rapidly growing field of research that is highly multidisciplinary. We seed our data collection process with the xai-scholar dataset [47], which was collected on December 31st 2022. The xai-scholar dataset was curated using a process of keyword-based search, manual expansion, citation expansion and keyword-based filtering to produce a dataset of 5,119 XAI papers [47]. For the purposes of our analysis, and to obtain a more complete view of the citation network, it was necessary to expand the xai-scholar dataset to contain ‘non-XAI’ works that are related to XAI research. Many of the earliest works in the field of XAI predate the modern terminology, and do not self identify as XAI research. Further, any works that are heavily cited by these papers are deemed relevant to the field and necessary to include in the citation graph to gain an accurate understanding of knowledge transfer in the literature. As such, we extend the xai-scholar citation network using metadata from the Semantic Scholar Academic Graph API [48]. We perform a 1-hop citation expansion and retain any papers that have a citation relationship (either citing or cited by) with more than one paper in the core set of xai-scholar papers. In other words, the final dataset contains all articles from the original xai-scholar dataset, together with any additional papers with a citation relationship to more than one of those articles. Any papers missing citation or reference data are removed as they appear as isolates in the citation graph. Similarly, any papers missing a title or abstract are excluded, as we would lack a representation of their content. The resulting citation network contains 20,604 papers published between 1889 and 2023 and 306,668 citations. The data collection from the Semantic Scholar Academic Graph was completed in November 2023, and the metadata for the final set of papers considered in this work are available on GitHub [49].

Methods

In this section, we present our methods for mapping and tracking the development of research areas and the patterns of knowledge transfer between them. Our framework combines dynamic community detection on citation networks with natural language processing of scholarly text. An overview of the framework is provided in Fig 1. Firstly, we represent a dynamic (or cumulative) citation network as a discrete-time dynamic network, with time steps representing snapshots of the network state. Applying community detection to these time step networks identifies step communities describing groups of papers representing distinct research areas based on shared citation patterns. Leveraging paper text – alongside other network and clustering methods – allows us to validate, interpret, and characterise these communities. Matching step communities in adjacent time steps and thereby tracking communities over time reveals the life-cycles of research areas as dynamic communities. We map knowledge transfer between research areas, through the use of a community interaction network. Combining these methods, we can track changes in community metrics and inter-community interactions to expose patterns knowledge transfer. After providing the details of the framework, in the final part of this section, we discuss how the proposed methods can allow us to address the key research questions originally introduced in the Introduction.

Download:

Fig 1. A flow chart outlining our methodology.

https://doi.org/10.1371/journal.pone.0329302.g001

Step communities

To investigate the life-cycles of research areas within a specific body of work, we require a citation network that evolves over time. In this work, we adopt a discrete time dynamic network such that the citation network is described by a sequence of time step graphs , where each time step graph is defined by a set of nodes and edges . Each set of nodes contains the papers present in the network at time step t and the set of edges E_t represent the citations between them. Due to the cumulative nature of citation networks, discrete time dynamic citation networks represent a unique type of dynamic network. Specifically, nodes (papers) and connections (citations) are never removed from the network. As such, the set of time step graphs are cumulative, i.e. . In this application, we divide the citation network by year, such that the step graph denoted by G₂₀₁₀ contains all papers in the dataset published before 2011.

Finding step communities.

To identify the research areas present in the literature at each time step, we apply community detection to the corresponding time step graph G_t in the dynamic citation network. Specifically, we use the OSLOM algorithm [50] to extract overlapping communities that represent densely connected groups of papers sharing knowledge sources and outputs. OSLOM is well-suited for our analysis due to several desirable properties. First, it can detect communities following a broad range of size distributions, avoiding the bias of some algorithms towards few large communities. Second, it identifies hierarchically nested communities, capturing the multiscale organisation common in citation networks. Third, it allows for overlapping communities, reflecting how papers can belong to multiple research areas. In S1 Appendix, we compare the size distributions of citation network communities found according to different community detection techniques. We refer to the communities discovered in time step graph G_t as step communities

We initialise OSLOM for each time step using the communities identified in the previous time step. This leverages the temporal continuity expected in the evolution of research areas. To ensure consistent community identification across time steps, we use a fixed set of hyperparameters for OSLOM. This avoids having apparent dynamic events (e.g. split, merge) arise without any changes in the relevant regions of the network, but solely due to changes in hyperparameter values between time steps. To select the hyperparameters, we perform a small grid search over the OSLOM resolution and threshold parameters for each time step graph. We record the parameter values that maximise the combined fitness of the 10 largest communities. The pair of resolution and threshold values that occur most frequently across all time steps is then used for community detection across the entire dynamic network.

Hyperparameters are chosen to maximise the ‘fitness’ (i.e., the proportion of edges with at least one end point in the community, that have both endpoints in the community [51]) of the largest communities across all time steps. This focuses the analysis on the dominant, established research areas which tend to be largest, while still allowing new communities representing emerging topics to form over time. The choice of the specific number of communities (10 in this case) can be tailored based on the analysis goals. For example, given the nature of our dataset, many nodes may exist in early time steps that are not yet relevant to the rest of the literature. These papers typically represent unrealised applications of methods in the field and present as small fractured or isolated components in the citation network. Thus, we de-emphasise these smaller, isolated communities when evaluating fitness, preferring to prioritise the detection of the larger, more established topics. For rapidly evolving fields, focusing on the largest communities allows us to capture the major research areas while allowing smaller emerging areas to form.

Labelling step communities.

Following [25, 38], we use the title and abstract text of the papers in a community to annotate the community’s topic. Specifically, we combine the terms from all the papers in a community into a bag-of-words vector. We then annotate each community using the top-n terms according to the Term Frequency inverse Community Frequency (TF-ICF). The ICF terms used to adjust the term frequencies are calculated per year. In addition we consider the category term descriptor (CTD) [52]. We treat each community as a category and calculate CTD based on:

(1)

Here C denotes the set of communities, CF(t_k) is the community frequency of term t_k, and is the document frequency for term t_k in community C_i.

Characterising step communities.

To characterise the step communities, we measure two key properties: topic coherence and citation density. Firstly, topic coherence relates to the concept of the topic disparity [53] for a set of articles and more broadly to the notion of coherence in topic modelling [54]. Initially, we learn a topic embedding for each article by passing the article title and abstract through a transformer-based language model trained on scientific articles (SciBERT), and taking the final hidden state of the [CLS] token. We then compute the topic centroid for the community by taking the mean of all of the topic embeddings. Finally, we calculate the topic coherence of the community as the average similarity between each article’s topic embedding and the community topic centroid. Our second measure, citation density, refers to the network density of the citation subgraph described by a given community. This is measured as the number of edges (or citations) in the subgraph, divided by the number of possible connections.

Measuring knowledge transfer.

In addition to characterising individual communities, at each time step we construct a community interaction network to model knowledge transfer between the identified research areas. In the interaction network at time step t, the set of nodes corresponds to the step communities discovered at time t and the set of edges correspond to the citations between them. Formally, I_t has the set of nodes , and the set of weighted edges J_t such that . Here is the probability of an interaction/citation between papers in and , given in Eq 2.

(2)

While the strength of the connections in the community interaction network reveal the extent of the knowledge transfer between two research areas, we can also consider standard network measures to summarise the nature of knowledge transfer for any given research area. Specifically, we rely on two primary perspectives to summarise knowledge transfer for a research area.

First, network proximity measures help us to understand citation behaviours between two specified topics or communities. The connection strength between two communities in the interaction network is considered as the ‘first-order’ network proximity. According to the above definition of the interaction network, the connection strength or edge weight between two communities is the probability of a citation between a paper in each community. For example, if 10% of the articles in research area A cite 10% of the articles in research area B, then the pair of communities will have a first-order proximity or ‘interaction probability’ of 0.01 or 1%. In typical, real-world citation networks, which contain large communities or research areas, community interaction scores above 1% would indicate a moderate-high level of knowledge transfer. However, as the probabilities are normalised with respect to the size of the communities, we recommend studying the interactions comparatively. See Figs 6 and 7 for examples. The ‘second-order’ network proximity of two communities is a measure of the similarity of the neighbourhoods of the communities. Thus, two communities will have high second-order proximity if they have similar citation behaviours with the other topics in the network. We measure second-order proximity as the cosine similarity between the community’s interaction probabilities across the network, that is, between their respective columns of the weighted adjacency matrix describing the community interaction network.

Second, network centrality measures reveal the extent to which topics are involved in the transfer of knowledge in the network. The unweighted degree centrality of a community in the interaction network reports the number of communities with which that topic shares knowledge, while the betweenness centrality measures the importance of that node in facilitating knowledge transfer across the network.

Dynamic communities

The previous sections detailed how we identify step communities in individual time step graphs. A key goal is to track the evolution of these communities over time. This section outlines our approach for constructing dynamic community life-cycles by linking step communities across time steps. This allows us to analyse how research areas emerge, grow, merge, split or dissipate as the field evolves.

Finding and tracking dynamic communities.

We follow the method proposed in [35] to track the life-cycles of communities in a cumulative citation network. In this framework, the step communities in the time step graphs represent observations of dynamic communities at a given time point (year). If we denote the set of step communities identified by OSLOM at time t as , a dynamic community can be then represented by a chronology of step communities, for example . At the first time point t₀, dynamic communities are formed using a one-to-one with the step communities . Subsequent step communities are added to the dynamic communities using a heuristic, many-to-many mapping. The most recent step community in a dynamic community is called its ‘front’. At a given time step, comparing the step communities with all of the dynamic community fronts can lead to a number of possible events:

If a step community does not match with any of the dynamic community fronts, it is added as a new dynamic community with a single step. This is known as community birth.
If a step community matches with a single front, that step community is added to that dynamic community timeline and becomes the new front.
If two or more step communities match with the same front, then new identical dynamic communities are formed and one of the matching step communities is added to each to act as it’s front. This is known as community splitting.
If a step community matches with multiple fronts, then it is added to each of them. This is known as community merging.
If a dynamic community front does not match with any of the step communities then the front is not updated. The front may match with step communities at subsequent time steps, thus allowing for intermittent community structures to be found. If the front does not match with any of the step communities in any of the subsequent time steps then this is known as community death.

To match community fronts with step communities, we follow the strategy proposed by [35]. Given a step community and dynamic community front F_j, we compute the similarity between , F_j as:

Using the above measure for similarity, we match step communities to front if the similarity exceeds a matching threshold .

Characterising dynamic communities.

To analyse the properties and evolution of dynamic communities tracked across time steps, we propose a set of six metrics. The first metric is the community lifespan, measured as the number of time steps in which it is present in the dynamic network. We then consider four metrics derived from time series data of the community’s constituent step communities at each time point. The community size time series tracks the number of papers belonging to the step communities over time. The degree centrality and betweenness centrality time series measure how central the step communities are within the community interaction network in facilitating knowledge flow. The final two metrics aim to quantify the stability and coherence of a dynamic community’s research focus as it evolves, similar to existing approaches taken in NLP-based studies of research topic evolution [10, 17]. The content coherence metric compares the textual similarity of papers between consecutive step communities. Specifically, for a dynamic community , we compute the average pairwise cosine similarity between the SciBERT topic embeddings of papers in and at each time step t. This yields a time series capturing how coherent the research topic remains. Similarly, the membership stability metric tracks changes in the specific paper membership of the community over time. It is calculated as the Jaccard similarity between the paper sets of consecutive step communities and in the dynamic community D.

While metrics like size and centrality characterise the community’s position and importance in the network, the content coherence and membership stability allow analysing the thematic evolution of the underlying research area. In most cases, we summarise the time series values using averages over the community’s lifespan or specific time periods of interest.

Research questions

Given the above described framework for discovering, interpreting and characterising research areas in some corpus, in this section we outline how our proposed methods can be used to answer our three key research questions from the Introduction relating to knowledge transfer in XAI research.

To what extent do contemporary topics in the literature rely on foundational research in psychology, statistics and computer science? We answer this question in a number of steps:
1. (a) Identify the foundational topics in the literature as long-lived communities with coherent subject matter that are consistently central in the interaction networks.
2. (b) Identify contemporary topics as the communities present in the later periods of the dataset that are populated by the most recent papers.
3. (c) Separate and compare those recent communities that cite the foundational topics from those that do not.
What are the research areas that are most isolated in terms of knowledge transfer within the literature? We refer to these isolated research areas as ‘knowledge silos’, and we identify them as the nodes with the lowest total interaction probability in the community interaction network.
Is there evidence of knowledge gaps between otherwise related research areas?/What are the most significant knowledge gaps in the literature? Our approach to identifying knowledge gaps is outlined below:
1. (a) Use a regression model to predict interaction probabilities in the community interaction network based on the research areas’ content similarity and citation neighbourhood proximity. Specifically, we use a regression model with a gamma distribution (implemented in Python 3.9.7 using Scikit-learn [55]) to predict the interaction probabilities between all pairs of communities in the final time step graph. The independent variables are: 1) the average pairwise cosine similarity between the SciBERT embeddings of papers in each community (content similarity), and 2) the cosine similarity between the communities’ connection probabilities in the interaction network (second-order network proximity capturing similarity of citation neighbourhoods). SciBERT embeddings are learned from the papers’ title and text using a pre-trained transformer language model (SciBERT [56]) provided by HuggingFace.
2. (b) Analyse the residuals: Pairs of communities with large positive residuals from the model predictions are then identified as having knowledge gaps, since they demonstrate far less knowledge transfer than expected given their content relatedness and structural proximity in the network.
3. (c) Highlight research areas that exhibit multiple large positive residuals and examine these knowledge gaps.

Results

The presentation of our results is structured as follows. Firstly, we present an initial analysis to visualise the life-cycles of illustrative examples of dynamic research communities. Following this, we address each of our research questions in turn. We describe the identification of foundational areas in the XAI literature and assess the extent to which contemporary XAI research topics build upon and integrate knowledge from these foundational areas. We identify potential knowledge silos – isolated research areas exhibiting minimal knowledge transfer with the rest of the field. Lastly, we model citation interactions between research areas to detect significant knowledge gaps where insufficient transfer occurs among otherwise related topics.

Preliminary results—community life-cycles

In this analysis, since we investigate knowledge transfer in the area of XAI-related research, we consider communities discovered at the lowest level of the OSLOM hierarchy. To demonstrate our approach and to provide context for our definition and discovery of research areas, we provide two examples of community life-cycles in the flow diagrams in Figs 2 and 3. In total, we identify life-cycles for 435 dynamic communities or research areas. Of these areas, 163 dissolve before the final time step in 2023. Research area sizes range from 3 to 320 papers, with a median size of 50. We choose the two examples in Figs 2 and 3, as they represent long-lived communities with different characteristics. Moreover, both of these example research areas are relevant to discussion in the following sections, as they are identified as foundation topics in the XAI literature. The flow diagram presented in Fig 2 shows the life-cycle of a research area. In particular, the figures shows how the research area relating to regression models ‘dissolves’, as its constituent papers are cited by work in many other topics. Dynamic community ‘death’—as it is described in the social network analysis literature—has some specific caveats in the context of citation network analysis. Specifically, as nodes and connections are not removed from the network, community ‘death’ only occurs in the form presented in Fig 2, where a community dissolves as the strength of connections between community’s members are surpassed by connections to external works. Crucially, we note that the community dissolution shown here is specific to the perspective of our dataset (i.e., of XAI-related research), as the works in the community pick up citations from the many other research areas where XAI methods can be applied or extended. In an analysis with a different focus – given another subset of the surrounding research – it might be seen that the regression community remains intact, or instead that the regression topic is realised by multiple, more specialised research areas. This flexibility is one of the benefits of bottom-up data driven classifications for research articles. Such classifications should be specific to the analysis, as there is no unique map of science [11].

Download:

Fig 2. Life-cycle of regression models research.

Flow diagram showing the life-cycle of a dynamic community pertaining to statistics research on regression models. Any papers present in the first or last realisation of the dynamic community are plotted. The nodes in the graph represent step communities and they are grouped by the time step in which they appear. The edges between nodes show the movement of papers between step communities. The dynamic community dissolves after 2010

https://doi.org/10.1371/journal.pone.0329302.g002

Download:

Fig 3. Life-cycle of neural networks research.

Flow diagram showing the life-cycle of a dynamic community of neural networks research that splits before 2005. The research area that focuses on rule extraction remains present in 2020. Any papers present in the first or last realisation of the dynamic community are plotted. The nodes in the graph represent step communities and they are grouped by the time step in which they appear. The edges between nodes show the movement of papers between step communities.

https://doi.org/10.1371/journal.pone.0329302.g003

For comparison, we include the life-cycle of another, more stable, research area in Fig 3. In this example, we can see that the community of neural network research splits some time before 2005 into works on ‘rule extraction’ and ‘recurrent’/ ‘connectionist architectures’. The ‘rule extraction’ community continues to grow steadily after this point, while the remaining connectionist computing community dissolves into various applications and sub-fields related to Deep Learning and its applications in XAI.

As a final preliminary result, in Fig 4 we compare the life cycles of research areas related to two important methods in the domain of XAI: ‘causal explanations’ and ‘counterfactual explanations’. It is apparent from the flow diagram that these two research areas have evolved in very different ways. Specifically, the causal explanations community appears at the confluence of multiple research areas that are spread across different fields of study (Computer Science, Statistics, and Psychology) and consisting of research published prior to 2010. By comparison, the counterfactual explanations community emerges rapidly and includes very little research predating 2020. As research outputs are grouped into research areas based on shared knowledge sources, the life cycles as presented indicate that the causal explanations research relies more on established literature from the fields of Psychology, Statistics and Computer Science, in contrast to the community of counterfactual explanations research. In the following section, we further assess the extent to which contemporary research areas in XAI rely on foundational knowledge, by directly measuring the knowledge transfer between research areas.

Download:

Fig 4. Life-cycles of counterfactual and causal explanation methods.

Flow diagram comparing the life-cycle of two dynamic communities relating to two general approaches to explanation: ‘causal explanation’ and ‘counterfactual explanation’. Any papers present in the final realisation of either dynamic community are plotted. The nodes in the graph represent step communities and they are grouped by the time step in which they appear. The edges between nodes show the movement of papers between step communities.

https://doi.org/10.1371/journal.pone.0329302.g004

Knowledge transfer from foundational research areas

We now assess the extent to which contemporary research areas in the field of XAI rely on the theoretical and methodological foundations of the literature. This analysis consists of three steps. In the first two, we identify foundational and contemporary research areas based on the dynamic community characteristics proposed in the Methods and Materials section, before using community interaction probabilities to measures the knowledge transfer between areas.

Firstly, we identify foundational areas in the literature as long-lived dynamic communities, which are important to the knowledge transfer in the field in the earlier portions of the dataset. Fig 5 shows a simplified view of the life-cycles of the dynamic communities with the highest (left) and lowest (right) betweenness centrality scores, as measured in the community interaction network in the period 2000–2010. We rely on centrality in the community interaction network to reveal the relative importance of the research area for knowledge transfer in the network over time. For example, all dynamic communities shown in Fig 5 are relatively long-lived and stable. However, the low centrality topics in Economics, Physics, and Medicine, remain isolated (degree 0, betweenness = 0), in the early portion of the dataset (2000–2010) as they represent later applications of XAI research and thus play no part in the knowledge transfer in the early period. In contrast, those research areas from the Computer Science, Statistics, and Psychology disciplines, such as ‘feature selection’, ‘regression’ and ‘association rule mining’, have consistently high betweenness centrality (0.1), suggesting that they engage in knowledge transfer consistently throughout the development of XAI research. Thus, we conclude these research areas to represent the methodological and theoretical foundations of XAI literature.

Download:

Fig 5. Foundation research areas life-cycles.

Dynamic community life-cycles of the communities with the highest betweenness centrality (left) and lowest betweenness centrality (right) in the period 2000–2010. Each row represents the life-cycle of a dynamic community and each cell in the row is populated if that dynamic community appears in the network as a step community in the corresponding time step. Each cell is coloured to show the most common ASJC category among the papers in the step community.

https://doi.org/10.1371/journal.pone.0329302.g005

Secondly, we assess the extent to which contemporary research areas in XAI leverage knowledge and methodologies from foundational research in the field. We identify the contemporary research areas in the literature as dynamic communities where the average age of the papers is six years or fewer (i.e., published since 2017). We focus our analysis on large communities (greater than 50 papers), with a content stability score above the mean measured across the dataset. Thus, we consider large communities of recent research papers with coherent content/topic as the clearest representations of the contemporary research in the literature.

Finally, we assess how these research areas rely on foundational research using interactions in the community interaction network measured at the most recent time step (2023). Fig 6 reports the interaction (citation) probabilities (see Eq 2), between the largest contemporary research areas in the field and the foundations identified previously. For comparison, we include equivalent scores measured between the contemporary research topics and more recent central topics, corresponding to the dynamic communities with the highest betweenness centrality during 2020–2023. We find that knowledge transfer from methodological and theoretical foundations in XAI research to contemporary research areas is quite varied, with subjects like ‘disentangled representation learning’ and ‘hate-speech detection’ making many more references to the foundational topics (interaction probabilities 1%) compared to topics like ‘semantic segmentation’ or ‘saliency maps’ (interaction probabilities 0). Furthermore, we note that the methodological foundations like ‘data mining’ and ‘association rule mining’ receive more citations from contemporary topics than the theoretical or psychology-based foundations.

Download:

Fig 6. Knowledge transfer between foundations of XAI and contemporary topics.

The percentage of total possible interactions (citations) between contemporary research areas in XAI literature and the foundation areas. For comparison, some recent central topics in XAI are included on the right. For readability, the interaction probabilities are scaled to percentages. For example, if 20% of papers in research area A cite 10% of papers in research area B, the resulting interactions score would be 2%. Community interaction probabilities are calculated according to Eq 2.

https://doi.org/10.1371/journal.pone.0329302.g006

In Fig 7, we highlight four subsets of the contemporary research areas which have similar content to compare how they interact with the established literature. In particular, we group research areas into four research topics relating to (i) fairness, (ii) natural language processing, (iii) computer vision, and (iv) adversarial machine learning. Thus, we reveal some patterns in knowledge transfer by topic. Overall, we find that research areas with similar content appear to engage in similar knowledge transfer patterns. For example, research areas within the ‘computer vision’ group exhibit similar knowledge transfer behaviour as they rely more on knowledge from the more recently central topics (interaction probabilities between 0.5%–1%), than they do from the historically central or ‘foundational’ topics (probabilities 0.1%). Similarly, research areas within the ‘adversarial machine learning’ topic all rely more on knowledge from the foundational literature than they do the more recently central topics. Conversely, research areas in the ‘fairness’ group demonstrate different knowledge transfer patterns despite their related content. In particular, the two research areas labelled ‘hate speech’ and ‘gender bias’ make fewer citations to methodological foundations of XAI research than the third research area (labelled ‘discrimination/fairness’).

Download:

Fig 7. The percentage of total possible interactions (citations) between contemporary research areas in XAI literature and the foundation areas.

For comparison, some recent central topics in XAI are included on the right. For readability, the interaction probabilities are scaled to percentages. For example, if 20% of papers in research area A cite 10% of papers in research area B, the resulting interactions score would be 2%.

https://doi.org/10.1371/journal.pone.0329302.g007

Knowledge silos

We now illustrate the identification of knowledge silos, which represent the research areas that are most isolated from the perspective of knowledge transfer. Specifically, we identify knowledge silos as the communities with the least interactions with other communities in the community interaction network. Table 1 reports summary statistics for the five research areas with the lowest sum across all interaction probabilities in the final (most recent) snapshot of the community interaction network (snapshot 2023). During this process we exclude small research areas made up of 10 or fewer papers.

Download:

Table 1. Knowledge silos.

https://doi.org/10.1371/journal.pone.0329302.t001

XAI applications in Environmental Sciences/Atmospheric Chemistry and Physics and COVID-19 diagnosis from chest CT images are identified as two of the most isolated research areas in the corpus, with respect to knowledge transfer. Each of these research areas represent intuitive applications of XAI and machine learning research to important real-world challenges. However, the isolated position of these areas in the community networks is problematic. In fact, recent studies have highlighted limitations to the utility of each of these applications in their respective domains [57, 58]. Specifically, Silva and Keller show that dependencies and strong correlations between features lead to model explanations that are inconsistent with process-level understanding [58]. Similarly, in their review of computer vision solutions for COVID-19 detection, Roberts at al. find that none of the solutions are clinically viable, due to important methodological flaws in the machine learning applications [57]. These cases represent examples of some of the practical issues with isolated research areas and limited knowledge transfer in interdisciplinary applications. They highlight that unrealised knowledge integration, either from the domain of the application (as in [58]), or from the domain of the methods (as in [57]), can lead to poorer outcomes in terms of the utility of the applications. In the following section, we demonstrate our methods for identifying knowledge gaps, which refer to pairs of communities that exhibit low levels of knowledge transfer despite pertaining to similar topics, or sharing similar knowledge sources or knowledge outputs.

Knowledge gaps

To identify potential knowledge gaps between research areas in the XAI literature, we modelled the expected knowledge transfer between communities based on their content similarity and proximity in the community interaction network. Pairs of communities exhibiting substantially lower interaction (citation) rates than predicted were flagged as having significant knowledge gaps. The model essentially captures the relationship between research areas’ content similarity and the similarity between their knowledge sources and knowledge outputs (the two independant variables), and their observed knowledge transfer behaviour (the dependant variable). By analysing the residuals, we identify those pairs of communities that demonstrate less knowledge transfer than we would expect based on their content and related work. This approach allows detecting gaps in knowledge flow that may arise due to disciplinary boundaries, insular reading and citation patterns, or lack of awareness of complementary work.

Table 2 highlights the four communities that most consistently demonstrate knowledge gaps (i.e., have the greatest total residual score across all possible research areas) and the research areas with which they have the most significant gaps. We recognise that the knowledge gaps identified can be categorised into two groups. (i) Between methodological research areas and potential applications (e.g. between counterfactual explanations and multiple research areas in computer vision or between contrastive explanations and research ares in natural language processing). (ii) Between two applied research areas studying the same or similar topics (e.g. computer vision for medical images). In the case of (ii), we identify a pair of very similar research areas – both related to computer vision for medical images – that are evolving in parallel, with minimal knowledge transfer between them. This highlights one of the key benefits of our proposed methods for delineating research areas from the perspective of knowledge transfer. While NLP-based methods for recognising research topics may (correctly) group these research areas into a single topic, it is important to acknowledge that they are distinct from one another when studying knowledge transfer. Thus, we observe the benefits of studying citation relations and article content in tandem. By considering both sources of information, we can identify gaps in the literature and help to expose researchers to relevant works that they may otherwise overlook.

Download:

Table 2. Knowledge gaps.

https://doi.org/10.1371/journal.pone.0329302.t002

Discussion

This work introduces a novel network analysis framework to study the dynamics of knowledge transfer and integration in rapidly evolving, interdisciplinary research fields. By applying dynamic community detection techniques to citation networks, we can identify and track the emergence, evolution, and interactions of research areas or topics directly from the published literature. The key methodological contributions include:

Providing a new perspective on dynamic community finding algorithms to facilitate their application to the unique context of citation networks, which exhibit cumulative growth and content-rich nodes (papers).
Developing methods to characterise the properties of identified dynamic communities over time, such as content coherence, and knowledge transfer centrality. These methods begin to bridge existing gaps between the citation network-based approaches to mapping research areas and the traditional NLP-based methods.
Analysing the interactions between dynamic communities using our proposed community interaction network to reveal patterns of knowledge transfer, isolate potential knowledge silos, and detect significant knowledge gaps.

We demonstrated the utility of our approach through a case study on eXplainable Artificial Intelligence (XAI) – an emerging, highly interdisciplinary field synthesising concepts from machine learning, psychology, philosophy and other domains. The key findings include:

Foundational areas, such as statistics, cognitive science, and interpretable machine learning, acted as important knowledge sources during the formation of the field of XAI. However, knowledge transfer between these areas and the contemporary topics in XAI research is limited and the extent of knowledge transfer varies across different contemporary research topics.
Certain research areas like applications in COVID-19 diagnosis and environmental science exhibit characteristics of knowledge silos, as they remain isolated from the knowledge transfer exhibited between the rest of the XAI-related research areas. Limitations to the utility of these applications have been highlighted by recent studies [57, 58].
Notable knowledge gaps were identified throughout the literature, falling under two themes broad themes: (i) Between methodological research areas and potential applications (e.g. between counterfactual explanations and multiple research areas in computer vision or between contrastive explanations and research ares in natural language processing). (ii) Between two applied research areas studying the same or similar topics (e.g. computer vision for medical images).

By mapping the flows, interdisciplinary integration, and boundaries of this evolving field, our analysis can inform strategies to promote cross-pollination, bridge disciplinary divides, and synthesise disparate ideas to drive innovation in XAI research and applications. More broadly, this work provides a data-driven framework to study the evolution of knowledge ecosystems and the dynamics of interdisciplinary integration directly from the published literature. The methodological contributions have applications spanning “science of science” studies, literature review and analysis, interdisciplinary research planning, and science policy and funding decisions. As scientific fields become increasingly specialised yet coupled, tools to understand and facilitate knowledge transfer across disciplines will become ever more critical. This work establishes a novel network analysis-based approach towards that important goal.

These methods can be leveraged to highlight knowledge gaps and silos as fertile ground for integrative and innovative interdisciplinary research. However, when exposed to distant ideas from other fields of science, researchers may benefit from additional supports to aid in their comprehension and synthesis, given the prevalence of specialised language [59–61]. For example, ‘analogy mining’ recognises structures common across disparate disciplines or knowledge bases which can help researchers to contextualise and contrast ideas from other research areas [62]. Similarly, Literate Based Discovery (LBD) surfaces implicit knowledge from explicit knowledge in separate research outputs [63]. Further development of tools like these is vital for facilitating interdisciplinary knowledge transfer. Moreover, such methods could be directly integrated into retrieval and recommendation systems, alongside the methods presented in our work, to reveal latent connections between distant topics.

Supporting information

S1 Appendix. Community size distributions according to different detection methods.

https://doi.org/10.1371/journal.pone.0329302.s001

(PDF)

References

1. Shi F, Evans J. Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines. Nat Commun. 2023;14(1):1641. pmid:36964138
- View Article
- PubMed/NCBI
- Google Scholar
2. Larivière V, Haustein S, Börner K. Long-distance interdisciplinarity leads to higher scientific impact. PLoS One. 2015;10(3):e0122565. pmid:25822658
- View Article
- PubMed/NCBI
- Google Scholar
3. So long to the silos. Nat Biotechnol. 2016;34(4):357. pmid:27054973
- View Article
- PubMed/NCBI
- Google Scholar
4. Dakiche N, Tayeb FBS, Slimani Y, Benatchba K. Tracking community evolution in social networks: a survey. Inf Process Manag. 2019;56(3):1084–102.
- View Article
- Google Scholar
5. Bruggeman J, Traag V, Uitermark J. Detecting communities through network data. Am Sociol Rev. 2012;77(6):1050–63.
- View Article
- Google Scholar
6. Vilhena DA, Foster JG, Rosvall M, West JD, Evans J, Bergstrom CT. Finding cultural holes: How structure and culture diverge in networks of scholarly communication. Sociol Sci. 2014;1:221.
- View Article
- Google Scholar
7. Samek W, Muller KR. Towards explainable artificial intelligence. Explainable AI: interpreting, explaining and visualizing deep learning. 2019. p. 5–22.
8. Chakrabort T, Sikdar S, Tammana V, Ganguly N, Mukherjee A. Computer science fields as ground-truth communities: their impact, rise and fall. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2013. p. 426–33.
9. Cunningham E, Smyth B, Greene D. Author multidisciplinarity and disciplinary roles in field of study networks. Appl Netw Sci. 2022;7(1):78. pmid:36408457
- View Article
- PubMed/NCBI
- Google Scholar
10. Harikandeh TSR, Aliakbary S, Taheri S. An embedding approach for analyzing the evolution of research topics with a case study on computer science subdomains. Scientometrics. 2023;1–16.
- View Article
- Google Scholar
11. Leydesdorff L, Rafols I. A global map of science based on the ISI subject categories. J Am Soc Inf Sci Technol. 2008;60(2).
- View Article
- Google Scholar
12. Leydesdorff L. Can scientific journals be classified in terms of aggregated journal-journal citation relations using the journal citation reports?. J Am Soc Inf Sci Technol. 2006;57(5):601–13.
- View Article
- Google Scholar
13. Milojević S. Practical method to reclassify Web of Science articles into unique subject categories and broad disciplines. Quant Sci Stud. 2020;1(1):183–206.
- View Article
- Google Scholar
14. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci U S A. 2008;105(4):1118–23. pmid:18216267
- View Article
- PubMed/NCBI
- Google Scholar
15. Chae S, Segev A, Lee U. Cannibalism in medical topic networks. Knowledge-Based Systems. 2016;108:168–78.
- View Article
- Google Scholar
16. Mane KK, Börner K. Mapping topics and topic bursts in PNAS. Proc Natl Acad Sci U S A. 2004;101(Suppl 1):5287–90. pmid:14978278
- View Article
- PubMed/NCBI
- Google Scholar
17. Song M, Heo GE, Kim SY. Analyzing topic evolution in bioinformatics: investigation of dynamics of the field with conference data in DBLP. Scientometrics. 2014;101:397–428.
- View Article
- Google Scholar
18. HabibAgahi MR, Kermani MAMA, Maghsoudi M. On the co-authorship network analysis in the process mining research community: a social network analysis perspective. Exp Syst Appl. 2022;206:117853.
- View Article
- Google Scholar
19. Deligiannis P, Vergoulis T, Chatzopoulos S, Tryfonopoulos C. Visualising scientific topic evolution. In: Companion Proceedings of the Web Conference 2021 . 2021. p. 468–72. https://doi.org/10.1145/3442442.3451371
20. Vázquez MA, Pereira-Delgado J, Cid-Sueiro J, Arenas-García J. Validation of scientific topic models using graph analysis and corpus metadata. Scientometrics. 2022:1–18.
- View Article
- Google Scholar
21. Wang X, He J, Huang H, Wang H. MatrixSim: a new method for detecting the evolution paths of research topics. J Informetrics. 2022;16(4):101343.
- View Article
- Google Scholar
22. Rahimi M, Maghsoudi M, Shokouhyar S. The convergence of IoT and sustainability in global supply chains: Patterns, trends, and future directions. Comput Indust Eng. 2024;197:110631.
- View Article
- Google Scholar
23. Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S. Science of science. Science. 2018;359(6379):eaao0185.
- View Article
- Google Scholar
24. Chakraborty T, Chakraborty A. OverCite: finding overlapping communities in citation network. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2013. p. 1124–31.
25. Jung S, Segev A. Analyzing future communities in growing citation networks. In: Proceedings of the 2013 International Workshop on Mining Unstructured Big Data Using Natural Language Processing. 2013. p. 15–22.
26. Quattrociocchi W, Amblard F, Galeota E. Selection in scientific networks. Soc Netw Anal Mining. 2012;2:229–37.
- View Article
- Google Scholar
27. Asatani K, Mori J, Ochi M, Sakata I. Detecting trends in academic research from a citation network using network representation learning. PLoS One. 2018;13(5):e0197260. pmid:29782521
- View Article
- PubMed/NCBI
- Google Scholar
28. Fortunato S. Community detection in graphs. Phys Rep. 2010;486(3–5):75–174.
- View Article
- Google Scholar
29. Bae SH, Halperin D, West JD, Rosvall M, Howe B. Scalable and efficient flow-based community detection for large-scale graph analysis. TKDD. 2017;11(3):1–30.
- View Article
- Google Scholar
30. Fraisier O, Cabanac G, Pitarch Y, Besancon R, Boughanem M. Uncovering like-minded political communities on Twitter. In: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 2017. p. 261–4.
31. Sah P, Singh LO, Clauset A, Bansal S. Exploring community structure in biological networks with random graphs. BMC Bioinformatics. 2014;15:220. pmid:24965130
- View Article
- PubMed/NCBI
- Google Scholar
32. Palla G, Barabási AL, Vicsek T. Quantifying social group evolution. Nat. 2007;446(7136):664–7.
- View Article
- Google Scholar
33. Chakrabarti D, Kumar R, Tomkins A. Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006. p. 554–60.
34. Spiliopoulou M, Ntoutsi I, Theodoridis Y, Schult R. Monic: modeling and monitoring cluster transitions. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006. p. 706–11.
35. Greene D, Doyle D, Cunningham P. Tracking the evolution of communities in dynamic social networks. In: Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining. 2010. p. 176–83.
36. Michel J, Parrend P. Metrics for community dynamics applied to unsupervised attacks detection. Rencontres des jeunes chercheurs en intelligence artificielle (RJCIA). 2023; p. 87.
37. Hopcroft J, Khan O, Kulis B, Selman B. Tracking evolving communities in large linked networks. Proc Natl Acad Sci U S A. 2004;101(Suppl 1):5249–53. pmid:14757820
- View Article
- PubMed/NCBI
- Google Scholar
38. Tan X. Tracking the evolution of communities and research topics in a dynamic citation network. 2022.
39. Shi F, Foster JG, Evans JA. Weaving the fabric of science: dynamic network models of science’s unfolding structure. Soc Netw. 2015;43:73–85.
- View Article
- Google Scholar
40. Leischow SJ, Best A, Trochim WM, Clark PI, Gallagher RS, Marcus SE, et al. Systems thinking to improve the public’s health. Am J Prev Med. 2008;35(2 Suppl):S196-203. pmid:18619400
- View Article
- PubMed/NCBI
- Google Scholar
41. Portenoy J, Radensky M, West JD, Horvitz E, Weld DS, Hope T. Bursting scientific filter bubbles: boosting innovation via novel author discovery. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 2022. p. 1–13.
42. Rodriguez-Esteban R. The speed of information propagation in the scientific network distorts biomedical research. PeerJ. 2022;10:e12764. pmid:35070506
- View Article
- PubMed/NCBI
- Google Scholar
43. Peng Y, Bonifield G, Smalheiser NR. Gaps within the biomedical literature: initial characterization and assessment of strategies for discovery. Front Res Metr Anal. 2017;2:3. pmid:29271976
- View Article
- PubMed/NCBI
- Google Scholar
44. Zhou H, Guns R, Engels TC. Towards indicating interdisciplinarity: characterizing interdisciplinary knowledge flow. J Assoc Inf Sci Technol. 2023;74(11):1325–40.
- View Article
- Google Scholar
45. Karunan K, Lathabai HH, Prabhakaran T. Discovering interdisciplinary interactions between two research fields using citation networks. Scientometrics. 2017;113(1):335–67.
- View Article
- Google Scholar
46. Cordeiro M, Sarmento RP, Gama J. Dynamic community detection in evolving networks using locality modularity optimization. Soc Netw Anal Mining. 2016;6:1–20.
- View Article
- Google Scholar
47. Jacovi A. Trends in explainable AI (XAI) literature. arXiv preprint 2023.
- View Article
- Google Scholar
48. Kinney RM, Anastasiades C, Authur R, Beltagy I, Bragg J, Buraczynski A. The semantic scholar open data platform. arXiv preprint 2023. https://arxiv.org/abs/2301.10140
49. Cunningham E, Greene D. Knowledge transfer in XAI research. 2023. https://github.com/your-repository-link
50. Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S. Finding statistically significant communities in networks. PLoS One. 2011;6(4):e18961. pmid:21559480
- View Article
- PubMed/NCBI
- Google Scholar
51. Lee C, Reid F, McDaid A, Hurley N. Detecting highly overlapping community structure by greedy clique expansion. arXiv preprint 2010. https://arxiv.org/abs/1002.1827
52. BC K, Narayanan K. An empirical study of feature selection for text categorization based on term weightage. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI’04). 2004. p. 599–602.
53. Kim M, Yoon J, Jung WS, Kim H. Quantifying the topic disparity of scientific articles. In: Companion Proceedings of the Web Conference 2022 . 2022. p. 769–73.
54. Roder M, Both A, Hinneburg A. Exploring the space of topic coherence measures. In: Proceedings of the 8th ACM International Conference on Web Search and Data Mining. 2015. p. 399–408.
55. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
- View Article
- Google Scholar
56. Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. https://www.aclweb.org/anthology/D19-1371
57. Roberts M, Driggs D, Thorpe M, Gilbey J, Yeung M, Ursprung S. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell. 2021;3(3):199–217.
- View Article
- Google Scholar
58. Silva SJ, Keller CA. Limitations of XAI methods for process-level understanding in the atmospheric sciences. Artif Intell Earth Syst. 2024;3(1):e230045.
- View Article
- Google Scholar
59. Bracken LJ, Oughton EA. What do you mean? The importance of language in developing interdisciplinary research. Trans Inst Br Geograph. 2006;31(3):371–82.
- View Article
- Google Scholar
60. Wear DN. Challenges to interdisciplinary discourse. Ecosystems. 1999:299–301.
- View Article
- Google Scholar
61. MacLeod M. What makes interdisciplinarity difficult? Some consequences of domain specificity in interdisciplinary practice. Synthese. 2018;195(2):697–720.
- View Article
- Google Scholar
62. Hope T, Chan J, Kittur A, Shahaf D. Accelerating innovation through analogy mining. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017. p. 235–43.
63. Henry S, McInnes BT. Literature based discovery: models, methods, and trends. J Biomed Inform. 2017;74:20–32. pmid:28838802
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Shi F, Evans J. Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines. Nat Commun. 2023;14(1):1641. pmid:36964138
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Larivière V, Haustein S, Börner K. Long-distance interdisciplinarity leads to higher scientific impact. PLoS One. 2015;10(3):e0122565. pmid:25822658
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. So long to the silos. Nat Biotechnol. 2016;34(4):357. pmid:27054973
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Dakiche N, Tayeb FBS, Slimani Y, Benatchba K. Tracking community evolution in social networks: a survey. Inf Process Manag. 2019;56(3):1084–102.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref5] 5. Bruggeman J, Traag V, Uitermark J. Detecting communities through network data. Am Sociol Rev. 2012;77(6):1050–63.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref6] 6. Vilhena DA, Foster JG, Rosvall M, West JD, Evans J, Bergstrom CT. Finding cultural holes: How structure and culture diverge in networks of scholarly communication. Sociol Sci. 2014;1:221.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref7] 7. Samek W, Muller KR. Towards explainable artificial intelligence. Explainable AI: interpreting, explaining and visualizing deep learning. 2019. p. 5–22.

[ref8] 8. Chakrabort T, Sikdar S, Tammana V, Ganguly N, Mukherjee A. Computer science fields as ground-truth communities: their impact, rise and fall. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2013. p. 426–33.

[ref9] 9. Cunningham E, Smyth B, Greene D. Author multidisciplinarity and disciplinary roles in field of study networks. Appl Netw Sci. 2022;7(1):78. pmid:36408457
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref10] 10. Harikandeh TSR, Aliakbary S, Taheri S. An embedding approach for analyzing the evolution of research topics with a case study on computer science subdomains. Scientometrics. 2023;1–16.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Leydesdorff L, Rafols I. A global map of science based on the ISI subject categories. J Am Soc Inf Sci Technol. 2008;60(2).
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Leydesdorff L. Can scientific journals be classified in terms of aggregated journal-journal citation relations using the journal citation reports?. J Am Soc Inf Sci Technol. 2006;57(5):601–13.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Milojević S. Practical method to reclassify Web of Science articles into unique subject categories and broad disciplines. Quant Sci Stud. 2020;1(1):183–206.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci U S A. 2008;105(4):1118–23. pmid:18216267
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref15] 15. Chae S, Segev A, Lee U. Cannibalism in medical topic networks. Knowledge-Based Systems. 2016;108:168–78.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref16] 16. Mane KK, Börner K. Mapping topics and topic bursts in PNAS. Proc Natl Acad Sci U S A. 2004;101(Suppl 1):5287–90. pmid:14978278
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref17] 17. Song M, Heo GE, Kim SY. Analyzing topic evolution in bioinformatics: investigation of dynamics of the field with conference data in DBLP. Scientometrics. 2014;101:397–428.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref18] 18. HabibAgahi MR, Kermani MAMA, Maghsoudi M. On the co-authorship network analysis in the process mining research community: a social network analysis perspective. Exp Syst Appl. 2022;206:117853.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref19] 19. Deligiannis P, Vergoulis T, Chatzopoulos S, Tryfonopoulos C. Visualising scientific topic evolution. In: Companion Proceedings of the Web Conference 2021 . 2021. p. 468–72. https://doi.org/10.1145/3442442.3451371

[ref20] 20. Vázquez MA, Pereira-Delgado J, Cid-Sueiro J, Arenas-García J. Validation of scientific topic models using graph analysis and corpus metadata. Scientometrics. 2022:1–18.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Wang X, He J, Huang H, Wang H. MatrixSim: a new method for detecting the evolution paths of research topics. J Informetrics. 2022;16(4):101343.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Rahimi M, Maghsoudi M, Shokouhyar S. The convergence of IoT and sustainability in global supply chains: Patterns, trends, and future directions. Comput Indust Eng. 2024;197:110631.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S. Science of science. Science. 2018;359(6379):eaao0185.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Chakraborty T, Chakraborty A. OverCite: finding overlapping communities in citation network. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2013. p. 1124–31.

[ref25] 25. Jung S, Segev A. Analyzing future communities in growing citation networks. In: Proceedings of the 2013 International Workshop on Mining Unstructured Big Data Using Natural Language Processing. 2013. p. 15–22.

[ref26] 26. Quattrociocchi W, Amblard F, Galeota E. Selection in scientific networks. Soc Netw Anal Mining. 2012;2:229–37.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref27] 27. Asatani K, Mori J, Ochi M, Sakata I. Detecting trends in academic research from a citation network using network representation learning. PLoS One. 2018;13(5):e0197260. pmid:29782521
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref28] 28. Fortunato S. Community detection in graphs. Phys Rep. 2010;486(3–5):75–174.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref29] 29. Bae SH, Halperin D, West JD, Rosvall M, Howe B. Scalable and efficient flow-based community detection for large-scale graph analysis. TKDD. 2017;11(3):1–30.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref30] 30. Fraisier O, Cabanac G, Pitarch Y, Besancon R, Boughanem M. Uncovering like-minded political communities on Twitter. In: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 2017. p. 261–4.

[ref31] 31. Sah P, Singh LO, Clauset A, Bansal S. Exploring community structure in biological networks with random graphs. BMC Bioinformatics. 2014;15:220. pmid:24965130
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref32] 32. Palla G, Barabási AL, Vicsek T. Quantifying social group evolution. Nat. 2007;446(7136):664–7.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref33] 33. Chakrabarti D, Kumar R, Tomkins A. Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006. p. 554–60.

[ref34] 34. Spiliopoulou M, Ntoutsi I, Theodoridis Y, Schult R. Monic: modeling and monitoring cluster transitions. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006. p. 706–11.

[ref35] 35. Greene D, Doyle D, Cunningham P. Tracking the evolution of communities in dynamic social networks. In: Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining. 2010. p. 176–83.

[ref36] 36. Michel J, Parrend P. Metrics for community dynamics applied to unsupervised attacks detection. Rencontres des jeunes chercheurs en intelligence artificielle (RJCIA). 2023; p. 87.

[ref37] 37. Hopcroft J, Khan O, Kulis B, Selman B. Tracking evolving communities in large linked networks. Proc Natl Acad Sci U S A. 2004;101(Suppl 1):5249–53. pmid:14757820
View Article
PubMed/NCBI
Google Scholar

[98] View Article

[99] PubMed/NCBI

[100] Google Scholar

[ref38] 38. Tan X. Tracking the evolution of communities and research topics in a dynamic citation network. 2022.

[ref39] 39. Shi F, Foster JG, Evans JA. Weaving the fabric of science: dynamic network models of science’s unfolding structure. Soc Netw. 2015;43:73–85.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref40] 40. Leischow SJ, Best A, Trochim WM, Clark PI, Gallagher RS, Marcus SE, et al. Systems thinking to improve the public’s health. Am J Prev Med. 2008;35(2 Suppl):S196-203. pmid:18619400
View Article
PubMed/NCBI
Google Scholar

[106] View Article

[107] PubMed/NCBI

[108] Google Scholar

[ref41] 41. Portenoy J, Radensky M, West JD, Horvitz E, Weld DS, Hope T. Bursting scientific filter bubbles: boosting innovation via novel author discovery. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 2022. p. 1–13.

[ref42] 42. Rodriguez-Esteban R. The speed of information propagation in the scientific network distorts biomedical research. PeerJ. 2022;10:e12764. pmid:35070506
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref43] 43. Peng Y, Bonifield G, Smalheiser NR. Gaps within the biomedical literature: initial characterization and assessment of strategies for discovery. Front Res Metr Anal. 2017;2:3. pmid:29271976
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref44] 44. Zhou H, Guns R, Engels TC. Towards indicating interdisciplinarity: characterizing interdisciplinary knowledge flow. J Assoc Inf Sci Technol. 2023;74(11):1325–40.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref45] 45. Karunan K, Lathabai HH, Prabhakaran T. Discovering interdisciplinary interactions between two research fields using citation networks. Scientometrics. 2017;113(1):335–67.
View Article
Google Scholar

[122] View Article

[123] Google Scholar

[ref46] 46. Cordeiro M, Sarmento RP, Gama J. Dynamic community detection in evolving networks using locality modularity optimization. Soc Netw Anal Mining. 2016;6:1–20.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref47] 47. Jacovi A. Trends in explainable AI (XAI) literature. arXiv preprint 2023.
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref48] 48. Kinney RM, Anastasiades C, Authur R, Beltagy I, Bragg J, Buraczynski A. The semantic scholar open data platform. arXiv preprint 2023. https://arxiv.org/abs/2301.10140

[ref49] 49. Cunningham E, Greene D. Knowledge transfer in XAI research. 2023. https://github.com/your-repository-link

[ref50] 50. Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S. Finding statistically significant communities in networks. PLoS One. 2011;6(4):e18961. pmid:21559480
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

[ref51] 51. Lee C, Reid F, McDaid A, Hurley N. Detecting highly overlapping community structure by greedy clique expansion. arXiv preprint 2010. https://arxiv.org/abs/1002.1827

[ref52] 52. BC K, Narayanan K. An empirical study of feature selection for text categorization based on term weightage. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI’04). 2004. p. 599–602.

[ref53] 53. Kim M, Yoon J, Jung WS, Kim H. Quantifying the topic disparity of scientific articles. In: Companion Proceedings of the Web Conference 2022 . 2022. p. 769–73.

[ref54] 54. Roder M, Both A, Hinneburg A. Exploring the space of topic coherence measures. In: Proceedings of the 8th ACM International Conference on Web Search and Data Mining. 2015. p. 399–408.

[ref55] 55. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
View Article
Google Scholar

[141] View Article

[142] Google Scholar

[ref56] 56. Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. https://www.aclweb.org/anthology/D19-1371

[ref57] 57. Roberts M, Driggs D, Thorpe M, Gilbey J, Yeung M, Ursprung S. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell. 2021;3(3):199–217.
View Article
Google Scholar

[145] View Article

[146] Google Scholar

[ref58] 58. Silva SJ, Keller CA. Limitations of XAI methods for process-level understanding in the atmospheric sciences. Artif Intell Earth Syst. 2024;3(1):e230045.
View Article
Google Scholar

[148] View Article

[149] Google Scholar

[ref59] 59. Bracken LJ, Oughton EA. What do you mean? The importance of language in developing interdisciplinary research. Trans Inst Br Geograph. 2006;31(3):371–82.
View Article
Google Scholar

[151] View Article

[152] Google Scholar

[ref60] 60. Wear DN. Challenges to interdisciplinary discourse. Ecosystems. 1999:299–301.
View Article
Google Scholar

[154] View Article

[155] Google Scholar

[ref61] 61. MacLeod M. What makes interdisciplinarity difficult? Some consequences of domain specificity in interdisciplinary practice. Synthese. 2018;195(2):697–720.
View Article
Google Scholar

[157] View Article

[158] Google Scholar

[ref62] 62. Hope T, Chan J, Kittur A, Shahaf D. Accelerating innovation through analogy mining. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017. p. 235–43.

[ref63] 63. Henry S, McInnes BT. Literature based discovery: models, methods, and trends. J Biomed Inform. 2017;74:20–32. pmid:28838802
View Article
PubMed/NCBI
Google Scholar

[161] View Article

[162] PubMed/NCBI

[163] Google Scholar

Figures

Abstract

Introduction

Related work

Data

Methods

Step communities

Finding step communities.

Labelling step communities.

Characterising step communities.

Measuring knowledge transfer.

Dynamic communities

Finding and tracking dynamic communities.

Characterising dynamic communities.

Research questions

Results

Preliminary results—community life-cycles

Knowledge transfer from foundational research areas

Knowledge silos

Knowledge gaps

Discussion

Supporting information

S1 Appendix. Community size distributions according to different detection methods.

References