Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Analysis of intracellular and intercellular crosstalk from omics data

Abstract

Disease phenotypes can be described as the consequence of interactions among molecular processes that are altered beyond resilience. Here, we address the challenge of assessing the possible alteration of intra- and inter-cellular molecular interactions among processes or cells. We present an approach, designated as “Ulisse”, which complements the existing methods in the domains of enrichment analysis, pathway crosstalk analysis and cell-cell communication analysis. It applies to gene lists that contain quantitative information about gene-related alterations, typically derived in the context of omics or multi-omics studies. Ulisse highlights the presence of alterations in those components that control the interactions between processes or cells. Considering the complexity of statistical assessment of network-based analyses, crosstalk quantification is supported by two distinct null models, which systematically sample alternative configurations of gene-related changes and gene-gene interactions. Further, the approach provides an additional way of identifying the genes associated with the phenotype. As a proof-of-concept, we applied Ulisse to study the alteration of pathway crosstalks and cell-cell communications in triple negative breast cancer samples, based on single-cell RNA sequencing. In conclusion, our work supports the usefulness of crosstalk analysis as an additional instrument in the “toolkit” of biomedical research for translating complex biological data into actionable insights.

Introduction

The understanding of how gene-related molecular alterations translate into pathological phenotypes is a major challenge in life sciences. The experience gained from reductionist approaches like genome-wide association studies – where millions of single nucleotide variations are independently tested for association with a phenotype – strengthen the key role of molecular interactions for deciphering the roots of complex phenotypes [1]. The term “network medicine” was introduced to designate the application of network science to study diseases, which are viewed as the consequence of molecular alterations on a complex system of interacting molecular processes [2].

At cellular level we can classify molecular interactions into two broad categories: intra-cellular and inter-cellular, according to whether they take place within a cell or among cells. Even if our knowledge of intra- and inter-cellular molecular interactions is incomplete, molecular networks are a crucial tool in biomedical research to translate complex molecular data from omics and multi-omics studies into actionable results [35].

Here, we describe an approach, designated as “Ulisse”, to gain mechanistic insight from gene lists that contain quantitative information about gene-level changes (e.g., mutated genes, differentially expressed genes). The method screens a collection of gene sets (e.g., protein complexes, molecular pathways, cell phenotype markers) to evaluate whether the interconnectivity between any gene set pair is more affected by gene-level changes than would be expected by chance. Considering the complexity of statistical assessment of network-based analyses, the statistical assessment is based on two complementary null models, which probe the outcome by sampling alternative configurations of gene-level changes and molecular interactions [6]. The main applications of Ulisse are in pathway analysis and cell-cell communication, where it can be used, respectively, to identify pathway crosstalk and cell-cell interactions that are affected by gene-level alterations detected through omics experiments (in bulk or at single-cell resolution). The analysis of pathway crosstalk and cell-cell communication can be combined to obtain integrated pathways that associate cell-cell communications and intracellular states. Lastly, the list of significantly altered crosstalks is analysed to identify the key genes that affect the interconnectivity among gene sets.

The applications of our approach encompass the domains of enrichment analysis, pathway crosstalk analysis and cell-cell communication analysis. Several approaches exist to perform such tasks, with relevant differences that stem from the specific focus for which methods are designed, input data types, outputs, statistical assessment of the results, and implementation.

Ulisse provides a novel type of analysis in comparison to existing gene set enrichment analysis approaches, which, net of some nomenclature variations, can be classified into four categories [79]: (i) over representation analysis or overlap methods (based on the hypergeometric test); (ii) functional class scoring or per-gene score analysis (e.g., GSEA [10] and GSVA [11]); (iii) pathway-topology based methods or pathway topology analysis (e.g., SPIA [12]); (iv) network crosstalk tools or network enrichment methods (e.g., ANUBIX [8] and NEAT [13]). Ulisse could be fit into the latter category, but it has a different focus. Indeed, existing network crosstalk tools evaluate the interconnectivity between a gene list (e.g., differentially expressed genes) and every gene set of a collection (e.g., KEGG pathways [14]), in the context of an interactome composed of gene-gene interactions (e.g., from STRING [15]). Instead, Ulisse quantifies to which extent the interconnectivity of every pair of gene sets (of the collection) is altered based on the information given in the input gene list.

The availability of computational tools to study pathway crosstalk is limited, despite their importance in regulatory mechanisms [16,17], obtaining effective drug combinations in cancer [18] and investigating complex diseases phenotype [19]. Two examples are the Latent Pathway Identification Analysis (LPIA) [20] and Pathway analysis using Network information (PathNet) [21], both developed for gene expression data. LPIA defines a network of pathways based on shared Gene Ontology [22] terms and a list of differentially expressed genes. Then, it identifies the pathways with the most significant centrality in the network, using gene expression data permutations to create a null model. As such, the pathway crosstalk in LPIA does not consider the molecular interactions between the two pathways, which, instead, is what Ulisse focuses on. PathNet “contextual analysis” quantifies pathway crosstalk similarly to Ulisse. It requires gene p-values as input and provides a matrix of p-values that estimate the significance of pathway-pathway interconnectivity, where input gene p-values are shuffled to create a null model. Differently from PathNet, Ulisse focuses on the interconnectivity between the genes that are not shared by the two pathways, tests the significance of crosstalk against permutations of gene-level changes and gene-gene interactions, provides a more extensive output (e.g., crosstalk score, p-values, number of altered interactions, etc.), and identifies gene scores based on their relevance as mediators of pathway crosstalks.

As for the domain of cell-cell communication, recent advancements in sequencing technologies have prompted the development of quite a relevant number of tools and resources for cell–cell communication inference, denoted by wide variations in requirements, scoring approaches, type of communication inferred, assumptions, and limitations [23]. In this complex scenario, the peculiar features of Ulisse are the focus on reconstructing a communication network where cell states (or types) are nodes, the statistical assessment based on two different null models, the integration of pathway crosstalk analysis with cell-cell communication and the identification of key genes involved in the communication network.

In the following, we describe Ulisse and, as a proof-of-concept, we apply our approach to study the alteration of pathway crosstalks and cell-cell communications in publicly available single‐cell RNA (scRNA) expression data from a recent study that proposed a high‐resolution map of cell diversity in normal and cancerous human breast [24,25].

Results

Crosstalk quantification, statistical assessment and key players

Here we describe how we define an altered “crosstalk” (or active crosstalk) between two gene sets, the assessment of its statistical significance and, lastly, how we use the results of crosstalk analysis to score genes based on their contribution (Fig 1, see S1–S3 Text in S1 File for further details). Note that we adopt a “gene-centric” view of molecular interactions – like in gene-centric human interactomes [4,5] – where the term “gene-gene interaction” refers to various types of molecular interactions (protein-protein, protein-RNA, protein-DNA) that involve the considered gene pair.

thumbnail
Fig 1. Overview of crosstalk analysis.

a) Input data. b) Visualization of the molecular interactions among gene sets, at inter-cellular and intra-cellular levels, which lead to altered crosstalks. c) The crosstalk value is supported by two null models. d) Crosstalk values can be distinguished based on their saturation; the crosstalk diversity and interaction diversity are two gene-level scores that enable the identification of key crosstalk mediators; these scores can be distinguished based on their saturation. e) Crosstalk analysis identifies networks of intra-cellular processes, cell-cell communication and intra-cellular processes associated with cell-cell communication.

https://doi.org/10.1371/journal.pone.0334981.g001

Two types of input are needed to calculate the altered crosstalk between any two gene sets X and Y (Fig 1a):

  • a list of gene-gene interactions, which can be derived from publicly available resources (like STRING [15] and Omnipath [3,26]);
  • one or more sets of gene-level weights (in the unit interval), which provide a summary of the gene-level alterations of interest (e.g., differential expression or mutations).

Formally, we quantify the crosstalk score as the sum of weighted products between the genes of X that interact with those of Y:

where A = (aij) is the adjacency matrix that specifies the interactions among the NG genes, while uX and uY are vectors of gene weights with positive values only for the genes of X, Y.

The definition of the quantities involved in the calculation C(X, Y) follows three scenarios, based on whether the crosstalk is between gene sets associated with intra-cellular states, inter-cellular states or both (Fig 1b).

To quantify the alteration of crosstalk between two intra-cellular pathways (or another type of gene system), we exclude the genes shared between them, that is XY = ∅, otherwise we would consider intra-pathway interactions. Further, gene weights uX and uY are defined from the same input source (e.g., same set of alterations), because the focus is on an intra-cellular characteristic, and the two gene sets represent internal states of the same cell. The molecular interactions are collected from “general purpose” database of interactions, like STRING [15].

Conversely, to quantify the active inter-cellular crosstalk between two gene sets that are associated with two cell types, it is expected to have XY = ∅, because the two cell types, for example, can express a series of genes in common. Moreover, uX and uY come from different sources, because the alterations are relative to distinct cell types. The molecular interactions are collected from databases that focus on ligand-receptor, like Omnipath [3,26], a collection composed by multiple sources (i.e., Ramilowski [27], CellPhoneBD [28]).

Lastly (third scenario), to quantify the intra-cellular alterations associated with inter-cellular alterations, we consider – for each cell type under analysis – the genes (set X) involved in any of the inter-cellular crosstalks of the cell type, and any altered pathway (set Y) of the cell type. Besides such peculiarity in the definition of X and Y, we have, like in the first scenario, that XY = ∅, and uX and uY are defined from the same input source, because they are relative to the same cell type.

A meaningful quantity that complements C(X, Y) is the crosstalk saturation

which captures, in the process under study, the number of altered interactions δLXY between X and Y, in relation to all the possible interactions LXY between X and Y. Indeed, similar values of C(X, Y) can be due to a higher or lower impairment of the links between the two gene sets.

To statistically benchmark the magnitude of an observed crosstalk value c we have to consider that it might depend on various features like gene set size, distribution of gene weights and gene degree. We focused on two null models, namely MA and Mu, in both of which we preserve gene set size, degree sequence, and the association between gene weight and gene degree (within the same bin over the degree sequence), and randomize, respectively, gene-gene interactions and gene weights (Fig 1c). Null model MA is designed to test the dependence of c from the network proximity of X and Y, while Mu is meant to test the dependence of c from the weights of X and Y genes. This leads to four possible outcomes, determined by the possible significance of c in a single null, in both or in neither of them (S1 Fig in S1 File, S2 Table in S2 File). As expected by the fact that the two nulls disrupt different features, the analyses performed in our proof-of-concept (see next sections) revealed negligible correlations between the values obtained with the two nulls (S2 Fig in S1 File). It is therefore meaningful to combine the two probabilities ρA and ρu, of observing – respectively – a value equal or greater than c in MA and Mu, into the probability of observing a product as small as the one observed

which is equal to the probability obtained by means of the so-called Fisher’s combined probability test [29,30].

Lastly, we define a summary score for ranking crosstalks, combining effect size C(X, Y) and its estimated probability p:

The list of significant crosstalks provides an opportunity for gene scoring (Fig 1d). We consider two gene-level quantities: crosstalk diversity and interaction diversity. The first counts how many gene sets that are part of altered crosstalks contain interactors of a gene gi; therefore, the saturation rX(i) of the crosstalk diversity of gi reaches 1 when all the gene sets that contain interactors of gi are part of altered crosstalks. The second counts the interactors of gi that belong to gene sets that are part of crosstalks; analogously to rX(i), the saturation rA(i) of the interactor diversity of gi reaches 1 when all the interactors of gi belong to gene sets that are part of altered crosstalks.

Alteration of crosstalks in triple negative breast cancer

As a proof-of-concept, we analysed the crosstalks in triple negative breast cancer (TNBC), using single‐cell RNA expression data from a recent study that proposed a high‐resolution map of cell diversity in normal and cancerous human breast [24,25]. Our objective is to show what kind of information can be extracted from the analysis of crosstalks using data generated by means of one the state-of-the-art technologies in transcriptomic analysis. In particular, we focused on cancer cells and analysed the interactions among intra-cellular processes whose alteration could be implied in the dysregulation of the reciprocal control among molecular mechanisms that could contribute to tumour progression. Then, we considered the communication between cancer cells and Cancer Associated Fibroblasts (CAF). Indeed, CAFs represent a peculiar hub of cell-cell communication within the tumour niche by promoting tumoral growth and malignancy by releasing factors targeting cancer cells, repressing immune response by their interactions with immune cells and inducing angiogenesis interacting with endothelial cells [3133]. Lastly, we shed light to cancer cell processes that could be associated with the communication between CAFs and cancer cells.

Alteration of intra-cellular crosstalks in cancer cells.

We screened the role of 304 (S2 and S3 Tables in S2 File) cancer epithelial cell markers (FDR < 0.05, log2(FC) > 0.5, Cancer epithelial vs all) in the crosstalk among intra-cellular processes (MSigDB Hallmarks database [34]). We found that most of the crosstalks is altered in up to 2 interactions and that the maximum number of altered interactions is 19, among a total of 14 genes belonging to allograft rejection and MYC targets (v1) (Table 1, Fig 2a, S4 Table in S2 File). We observed a marked variability of gene weights, degree of statistical significance, and saturation, independently from the number of affected links, which makes such pieces of information useful to differentiate crosstalks (Fig 2a2c). In particular, we observed a total of 59 crosstalks whose score can hardly be obtained (α = 0.01) when shuffling interactions or gene weights, and 14 crosstalks that are supported by both nulls (Fig 2b). Among these, we obtained several crosstalks that involves the p53 pathway, cholesterol homeostasis and androgen response, which emerge as hubs in the network of altered crosstalks (Fig 2d). In 24 crosstalks, the saturation indicates the alteration of more than half of the links (Fig 2c), like between p53 pathway and KRAS signalling (“KRAS_SIGNALING_DN”), which involves the alterations of 3 out of 4 interactions between a total of 5 genes, and the score is supported by both nulls (Fig 2c, S4 Table in S2 File).

thumbnail
Table 1. Top 5 pathway crosstalks mediated by cancer cell DEGs.

https://doi.org/10.1371/journal.pone.0334981.t001

thumbnail
Fig 2. Intra-cellular crosstalks controlled by expression changes specific of TNBC cells.

a) number of altered links (δL) and average gene weight ||(uX, uY)|| of the crosstalk forming genes. b) The probabilities ρA and ρu estimated by the two null models for each crosstalk value; the vertical and horizontal lines denote α = 0.01, while the diagonal line denotes p = 0.001. c) Crosstalk score s and its saturation rc. d) Network of processes that establish crosstalks supported (α = 0.01) by at least a null model. e) Over representation analysis p-values p(X) and p(Y) for each of the processes (X, Y) that establish a crosstalk; the vertical and horizontal lines denote α = 0.05.

https://doi.org/10.1371/journal.pone.0334981.g002

To compare the outcomes of crosstalk and pathway enrichment analyses, we assessed to which extent the processes exhibiting significant crosstalks are also marked by significant enrichment (hypergeometric test) in DEGs (Fig 2e, S5 Table in S2 File). As expected, the two types of analyses provide a complementary view, where several processes involved in significant crosstalks do not display enrichment and vice versa. Only in a few cases (7 pairs) both the processes are enriched (p-value < 0.05) in DEGs, while the majority of altered crosstalks takes place between pairs of processes that are not enriched in DEGs.

Significantly altered crosstalks are mediated by a total of 57 genes (S6 Table in S2 File). Crosstalk diversity and interaction diversity suggest a gene prioritization that is independent from their initial alteration score. In other words, genes that were ranked low by differential expression analysis can emerge as key players as mediators of crosstalks. This is the case of the two cyclin-dependent kinases CDKN2A and CDKN2B, which stand out for their crosstalk diversity, as they mediate 11 and 10 altered crosstalks, respectively (Fig 3a). Among the genes with the highest interactor diversity, we obtained a series of genes that code for ribosomal-associated proteins (Fig 3b). We observed a wide range of saturations and an overall correlation between the saturation of crosstalk diversity and that of interaction diversity (S6 Table in S2 File). Among the genes with the highest values of both saturations we found the two tumour proteins D52 (TPD52) and D53 (TPD52L1) (Fig 3), which are involved in cancer cells proliferation and more aggressive phenotype [35,36].

thumbnail
Fig 3. Crosstalk diversity and interaction diversity of the DEGs that are involved in intra-cellular crosstalks in TNBC cells.

a) Crosstalk diversity dX and its saturation rX. b) Interactor diversity dA and its saturation rA.

https://doi.org/10.1371/journal.pone.0334981.g003

Inter-cellular crosstalks (cell-cell communication).

We analysed the inter-cellular interactions (Omnipath [3]) among the 36 pairs of gene sets defined by the DEGs (FDR < 0.05, log2(FC) > 0.5) of 9 cell types (1-vs-all), in 8 TNBC tumors [24] (S2 and S3 Tables in S2 File). Compared to the intracellular crosstalks among processes, here we dealt with larger gene sets and more links among them. As expected, this scenario led to a higher number of altered interactions, with a median of 38 and a maximum of 162 between CAFs and endothelial cells (Fig 4a, S7 Table in S2 File). Further, the crosstalks are statistically supported (α = 0.01) mostly by their gene weights (21 pairs), rather than interactions, which support the 3 crosstalks that are significant in both nulls, namely between B cells and tumour-associated macrophage (TAMs), between dendritic cells (DCs) and TAMs, and between B cells and DCs. Saturation reaches up to one quarter of the possible interactions between CAFs and endothelial cells (Fig 4b).

thumbnail
Fig 4. Inter-cellular crosstalks controlled by expression changes in the 9 cell types of TNBC samples.

a) Number of altered links (δL) and average gene weight (||(uX, uY)||) of the crosstalk forming genes. b) Crosstalk score s and its saturation rc. c) Above: DEGs that mediate the communication between CAFs and cancer cells; below: the position of cells in the space of the first two tSNE dimensions (bottom), coloured by cell type whose communications supported (α = 0.01) by at least a null model are indicated through a link between the two centroids.

https://doi.org/10.1371/journal.pone.0334981.g004

The emerging cell-cell communication network (Fig 4c) highlights a relevant role of such microenvironment cells, which establish several significant interactions. The communication between cancer cells and CAFs is supported (α = 0.01) by randomization of gene weights, and involves 22 interactions between a total of 33 DEGs (Fig 4c, S8 Table in S2 File). Among the key players of this communication, we found MDK and MFGE8 (expressed in cancer cells), which mediate 4 and 3 interactions, respectively, with genes expressed in CAFs, including integrins ITGB1 and ITGB5 (S8 Table in S2 File).

The cell-cell communication network (α = 0.01) involves 379 genes (Fig 5, S9 Table in S2 File). Among the genes that stand out for their ubiquity we observed CXCR4, with a crosstalk diversity of 8 (out of 9 cell-types present), and ICAM1, TGFB1, ITGB2, PTPN6 and some Major Histocompatibility Complex genes (HLA-C, HLA-DRA, HLA-DRB1), which show a crosstalk diversity of 7. Among the 29 DEGs in cancer epithelial cells (out of 379), MFGE8, LAMP1, RPSA and AZGP1 are specific (dX = 1, rX = 1) of the communication with CAFs (Fig 5). Conversely, we did not observe DEGs in CAFs that are specific to the signalling with cancer cells. However, there is one gene, PLAT, which is uniquely involved in the communication with cancer cells (dX = 1).

thumbnail
Fig 5. Crosstalk diversity and interaction diversity of the DEGs that are involved in inter-cellular crosstalks between CAFs and cancer cells.

a-b) Crosstalk diversity dX (a) and Interactor diversity dA (b) in relation to the number of cell types (n) in which the gene is differentially expressed.

https://doi.org/10.1371/journal.pone.0334981.g005

Integrated crosstalks: Cancer cell pathways that can be associated with the communication between cancer cell and CAFs.

To identify cancer cell pathways that can be associated with the communication between cancer cell and CAFs, we analysed the crosstalks between the gene set of the 14 cancer cell DEGs that mediate the communication with CAFs (CAFs-Cancer cell communication), and cancer cell pathways (MSigDB Hallmarks) that contain cancer cell DEGs (Fig 6, S10 Table in S2 File). We found 15 interactions supported (α = 0.01) by at least a null model and two supported by both nulls. The first involves interactions among RPSA (CAFs-Cancer cell communication), and other ribosomal proteins (RPL18, RPL6, RPLP0, RPS10, RPS2, RPS3, RPS5, RPS6) that are regulated by MYC. The second take place between, APP and PTPRF (CAFs-Cancer cell communication), and CLU and CTNNB1 (cholesterol homeostasis).

thumbnail
Fig 6. Intra-cellular processes of cancer cells associated with their signalling with CAFs.

The processes are ranked from top to bottom by decreasing value of s; squares indicate processes that were not found in the analysis of intra-cellular crosstalks; link colour indicates statistical evidence (as in Figs 2 and 4), with the exception that, here, the dashed line replaces the white colour in indicating that both null models are above α = 0.01.

https://doi.org/10.1371/journal.pone.0334981.g006

Almost all the interactions found (13 out of 15) involve processes that establish significant (α = 0.01) intra-cellular crosstalks in cancer cells (S4 Table in S2 File). Conversely, the pathways of complement and coagulation did not emerge in the screening of intra-cellular crosstalks in cancer cells. Their relation with CAFs is mediated by the interaction between APP (CAFs-Cancer cell communication) and CLU (complement and coagulation).

Discussion

We presented a network-based approach (Ulisse) to assess the alterations of crosstalks between gene sets, based on gene-centric molecular interactions and one or more lists of gene scores that result from omics data analysis. According to how gene sets, gene lists and gene-gene interactions are defined, our method can be applied to inter-cellular as well as intra-cellular crosstalks, and to the analysis of intra-cellular crosstalks that can be associated with inter-cellular crosstalks. The score of a crosstalk is proportional to the gene-gene interactions between the two gene sets that are affected by the gene-level changes. This provides an intuitive means to quantify crosstalks, which can be understood in terms of the interactions and molecular alterations involved. As a proof-of-concept, we analysed the crosstalks affected by the gene expression alterations detected at single-cell resolution in triple negative breast cancer samples, because this type of data allowed us to demonstrate the two main applications, that is, pathway crosstalk analysis and cell-cell communication analysis. However, the method is not restricted to gene expression data. The pathway crosstalk analysis can be applied to gain insights from various types of gene lists (e.g., containing information on DNA variation), derived from new experiments as well as from public resources, such as, in the domain of cancer, the Genomic Data Commons [37], COSMIC [38] or the cBio Cancer Genomics Portal [39]. Similarly, the cell-cell communication analysis in Ulisse can also handle other data types, like mutational or epigenetic profiles at single cell resolution.

The score is supported by two complementary null models that conserve gene set size, degree sequence, and the association between gene weight and gene degree. These nulls provide a means to assess whether the statistical significance of the score comes from gene weights, interactions or both. In the proof-of-concept, we showed that all three scenarios emerge when using real data, underlying the importance of assessing the crosstalk analysis outcome from different angles [6].

We reported altered crosstalks at various degree of saturation, especially in the analysis of intra-cellular crosstalks. This quantity enabled the identification of pairs of processes where most of the interactions involved gene expression changes or, on the opposite, pairs of processes where only a specific part of their interaction is affected. For example, our analysis identified that the pathway of KRAS is involved in crosstalk dysregulation associated with TNBC, supporting evidence that indicates this pathway as crucial in phenotypical and metabolic features of cancer cells [40].

We showed that the analysis of intra-cellular crosstalks complements the typical enrichment analysis. Indeed, we reported a series of gene sets that, despite not showing significant enrichments in DEGs, were part of significantly altered crosstalks. This is the case of one of the top ranked crosstalks (supported by both nulls), which suggests the impairment of regulative mechanisms between androgen response and apoptosis. Notably, the relation between androgen receptor and apoptosis has been implicated in breast cancer metastasis [41,42]. Another example is cholesterol homeostasis, which emerged as a hub of the intra-cellular network and is reported to promote cancer cell proliferation in TNBCs [43].

The reconstruction of inter-cellular communications based on cell type-associated gene sets provides a means to overcome the heterogeneity at gene expression level and sheds light on the general picture of the active (or altered) communications among the cell types. The analysis of TNBC cell types confirmed the well-known core network of communications between cancer cells and microenvironment. Cancer cells show significant communication with CAFs, supporting the pro-tumoral role of CAFs by activating the signalling associated with proliferation and tumour progression [3133,44].

The joint analysis of intra-cellular and inter-cellular cross talks paves the way towards the reconstruction of maps that integrate the communications between different cell types with the pathway crosstalks activated within each one. In the proof-of-concept we analysed the processes that are activated in cancer cells and can be associated with their communication with CAFs. Notably, a mediator of such crosstalk is the extracellular chaperone CLU, which was reported as a key player in cancer [45] and an interesting actionable target in TNBC [46,47].

With the aim of providing an additional way of identifying the genes associated with a phenotype, we introduced the crosstalk diversity and interaction diversity. These quantities, which are fundamentally different from purely topological features like degree and betweenness centrality, shed light on the genes that act as mediators of the signalling between processes or cell types. We showed, as a proof-of concept, that several genes with extreme crosstalk diversity and interaction diversity are indeed known to be associated with the process under study. Among them, EEF2 was demonstrated to be upregulated in several cancers and associated with worse prognosis, thus suggesting its potentiality as novel therapeutic target [48,49]. The high interaction diversity of ribosomal-associated proteins sustains the importance of the dysregulation of translation process in tumorigenesis mechanisms and the clinical potential represented by targeting this process in tumour cells [50]. Interestingly, some of the genes prioritized by crosstalk diversity and interactor diversity have marginal expression changes and, therefore, stand out due to their pattern of interactions with other altered genes. This is the case of CDKN2A and CDKN2B, which exert a role in the regulation of cell cycle and proliferation and their association with breast cancer is largely studied [51,52]. The analysis of genes that mediate the inter-cellular communications revealed a series of genes shared by multiple communications. These genes are involved in tumour promoting functions supporting tumour growth, chronic inflammation and angiogenesis, by secretion of growth factors and other soluble molecules, vesicles, and mechanic interactions among cells and extracellular matrix [33,5355]. Concerning the genes that mediate the signalling between CAFs and cancer cells, MDK and MFGE8 (expressed in cancer cells) are known to be associated with the acquisition of various tumour hallmarks [56,57]. Studies suggest the involvement of AZGP1 in the differentiation of progenitor cells into CAF to support tumorigenesis [58], while RPSA and LAMP1 are implicated in poor prognosis in breast cancer [59,60]. PLAT was reported to regulate the ability of breast cancer CAFs to invade stroma [61], and as an angiogenetic factor of CAF associated with negative prognosis in colon cancer [62].

Crosstalk diversity and interaction diversity can be relevant for the choice of actionable targets. Genes that affect several crosstalks interacting with multiple cellular functions are interesting targets for therapy, but – at the same time – could be associated with a wide spectrum of negative side effects. The saturations of crosstalk diversity and interaction diversity provide a means to collect more selective targets for therapy, as it prioritizes genes that mediate crosstalks with less but more disease-specific cellular functions.

The results presented in this study have to be seen in light of some limitations. The gene-gene interactions available in the literature are aspecific, and as such, they are a model of the interactions that potentially take place in the biological system under analysis. Moreover, the collections of molecular interactions are known to be affected by the various biases [4,5]. We have used state-of-the art collections and filtered the interactions to ensure an appropriate trade-off between coverage of genes and presence of biases, following the recommendations of previous studies [4,5]. As a proof-of-concept, we studied the intra-cellular crosstalks using gene set definitions from MSigDB hallmarks [34]. There are multiple ways to define intra-cellular processes, e.g., using databases like KEGG [14] and Reactome [63]. Therefore, other analyses of intra-cellular crosstalks in cancer cells of TNBC are possible and could highlight additional mechanisms. Moreover, the number of tested genes was limited by the sensitivity and depth of the scRNA sequencing technology underlying the data we used. In turns, the results emerged in the proof-of-principle should be interpreted considering this limited observability of the underlying molecular processes. Possibly, if the information on the input data is sparse, one could consider an amplification of the input gene weights based on the network proximity of genes, using a diffusion-based transformation [64,65].

Overall, Ulisse provides novel opportunities in the domains of gene set enrichment analysis, pathway crosstalk analysis and cell-cell communication. Collectively, compared to existing methods (see the introduction), Ulisse has a different focus, considers a more extensive statistical assessment, supports various input types and provides a different outcome. Such differences would lead to results that are more or less different from what can be obtained using other tools on the same input data. Therefore, considering such differences between Ulisse and existing tools and the issues in performing benchmarks in the absence of ground truths [23], we left a quantitative comparison for future work. Indeed, a trustworthy comparison among tools in the absence of a ground truth should consider multiple approaches, like literature agreement, experimental validation, indirect validation, synthetic dataset validation and robustness assessment truths [23]. Instead, in this manuscript, we focused on describing Ulisse, its relation to existing approaches, and showing the relevance of the results achieved in the proof-of-concept, considering the literature on breast cancer.

In conclusion, the approach presented in this work and the results gained in the proof-of-principle, even in the light of their limitations, support the usefulness of crosstalk analysis as an additional instrument to the “toolkit” of biomedical research for translating complex biological data into actionable insights.

Methods

Definition of gene weights from single cell RNA-sequencing data

The Seurat data object “SeuratObject_TNBC.rds” containing single-cell RNA expression data of 8 triple negative breast tumors [24,25] was downloaded from figshare [66]. The associations between the 9 cell clusters and cell types (not available in the Seurat object) were obtained on the basis of the cell association provided by the authors in the figures of the paper, together with “SeuratObject_TNBCSub.rds” object (S3 Fig in S1 File). Differentially expressed genes were obtained by means of MAST algorithm [67], testing each cell type against all the other cells (Seurat [68] function “FindAllMarkers()”, default parameters). Differential expression statistics were used to define a gene weight vector uj (of size equal to the total number of genes in the considered analysis) for each cell type j combining the log fold change xij of each gene i with its adjusted p-value (Benjamini-Hochberg method [69]); to reduce noise, scores associated with marginal significance were set to zero, that is when pij < 0.05 and log2(xij) ≥ 0.5, while yij = 0 otherwise. Each vector was normalized to have a maximum value of 1: .

Molecular interactions and gene sets

Molecular interactions used for pathway crosstalk analysis were downloaded from STRING [15] (v12, https://string-db.org/cgi/download). The combined score was updated excluding “text mining” using a modified version of the script “combine_subscores.v2.py” (https://stringdb-downloads.org/download). Ensembl identifiers were mapped to Entrez Gene identifiers using the mapping available in STRING (https://string-db.org/cgi/download) and Entrez Gene (ftp://ftp.ncbi.nih.gov/gene/DATA, September, 19, 2023). The highest score was considered for each gene pair. Only high-confidence (combined score ≥ 700) interactions and the top 3 (per gene) interactions with medium confidence (STRING score ≥ 400) were considered, obtaining a total of 174’962 interactions involving 17’288 genes. Molecular interactions available in Omnipath [26] were obtained through the R package OmnipathR [70] (September, 2024), for a total of 4’312 interactions involving 1’782 genes. The MSigDB Hallmarks gene sets [34] were collected through the R package “msigdbr” v7.4.1 [71].

In each analysis, the initial gene set list was created to ensure that: each gene had at least an interaction; only gene sets with at least 3 elements and a non-null gene weight were considered; to reduce the number of possible gene set pairs, only those such that C(X, Y) > 0 were considered.

Randomizations and computational aspects

A total of 1000 randomizations of gene labels was used to create the null models. Gene degree was preserved splitting the degree sequence in equally sized bins, 9 for intra-cellular crosstalks, 4 to study inter-cellular communications, and 7 to study cancer cell intracellular crosstalks associated with their communication with CAFs. The number of bins was defined as the highest value (at most 15) that guaranteed non-empty bins. The average computational cost of the analysis of intracellular crosstalks (203 gene set pairs) was approximately 4 minutes on 8 cores with 64GB of RAM per core.

Code availability

The computational method used in this study (Ulisse v2.0) is available in Zenodo with the identifier 10.5281/zenodo.15166722. Source code and documentation are freely available in github at the URLs https://github.com/emosca-cnr/Ulisse and https://emosca-cnr.github.io/Ulisse.

Supporting information

S1 File. S1-S3 Texts, S1-S3 Figures, S1 Table, and captions of S2-S10 Tables.

https://doi.org/10.1371/journal.pone.0334981.s001

(PDF)

References

  1. 1. Boyle EA, Li YI, Pritchard JK. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell. 2017;169(7):1177–86. pmid:28622505
  2. 2. Barabási A-L, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2010;12(1):56–68.
  3. 3. Türei D, Valdeolivas A, Gul L, Palacio-Escat N, Klein M, Ivanova O, et al. Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Mol Syst Biol. 2021;17(3):e9923. pmid:33749993
  4. 4. Mosca E, Bersanelli M, Matteuzzi T, Di Nanni N, Castellani G, Milanesi L, et al. Characterization and comparison of gene-centered human interactomes. Brief Bioinform. 2021;22(6):bbab153. pmid:34010955
  5. 5. Wright SN, Colton S, Schaffer LV, Pillich RT, Churas C, Pratt D, et al. State of the interactomes: an evaluation of molecular networks for generating biological insights. Mol Syst Biol. 2024;21(1):1–29.
  6. 6. Váša F, Mišić B. Null models in network neuroscience. Nat Rev Neurosci. 2022;23(8):493–504.
  7. 7. Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol. 2012;8(2):e1002375. pmid:22383865
  8. 8. Castresana-Aguirre M, Sonnhammer ELL. Pathway-specific model estimation for improved pathway annotation by network crosstalk. Sci Rep. 2020;10(1):13585. pmid:32788619
  9. 9. Buzzao D, Castresana-Aguirre M, Guala D, Sonnhammer ELL. Benchmarking enrichment analysis methods with the disease pathway network. Brief Bioinform. 2024;25(2):bbae069. pmid:38436561
  10. 10. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50. pmid:16199517
  11. 11. Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics. 2013;14:7. pmid:23323831
  12. 12. Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, Kim J-S, et al. A novel signaling pathway impact analysis. Bioinformatics. 2009;25(1):75–82. pmid:18990722
  13. 13. Signorelli M, Vinciotti V, Wit EC. NEAT: an efficient network enrichment analysis test. BMC Bioinformatics. 2016;17(1):352.
  14. 14. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457-62. pmid:26476454
  15. 15. Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023;51(D1):D638–46. pmid:36370105
  16. 16. Mosca E, Alfieri R, Maj C, Bevilacqua A, Canti G, Milanesi L. Computational modeling of the metabolic States regulated by the kinase akt. Front Physiol. 2012;3:418. pmid:23181020
  17. 17. Mosca E, Barcella M, Alfieri R, Bevilacqua A, Canti G, Milanesi L. Systems biology of the metabolic network regulated by the Akt pathway. Biotechnol Adv. 2012;30(1):131–41. pmid:21856401
  18. 18. Jaeger S, Igea A, Arroyo R, Alcalde V, Canovas B, Orozco M, et al. Quantification of Pathway Cross-talk Reveals Novel Synergistic Drug Combinations for Breast Cancer. Cancer Res. 2017;77(2):459–69. pmid:27879272
  19. 19. Hu Y, Yang Y, Fang Z, Hu Y-S, Zhang L, Wang J. Detecting pathway relationship in the context of human protein-protein interaction network and its application to Parkinson’s disease. Methods. 2017;131:93–103. pmid:28790017
  20. 20. Pham L, Christadore L, Schaus S, Kolaczyk ED. Network-based prediction for sources of transcriptional dysregulation using latent pathway identification analysis. Proc Natl Acad Sci U S A. 2011;108(32):13347–52. pmid:21788508
  21. 21. Dutta B, Wallqvist A, Reifman J. PathNet: a tool for pathway analysis using topological information. Source Code Biol Med. 2012;7(1):10. pmid:23006764
  22. 22. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9. pmid:10802651
  23. 23. Cesaro G, Nagai JS, Gnoato N, Chiodi A, Tussardi G, Klöker V, et al. Advances and challenges in cell-cell communication inference: a comprehensive review of tools, resources, and future directions. Brief Bioinform. 2025;26(3):bbaf280. pmid:40536815
  24. 24. Pal B, Chen Y, Vaillant F, Capaldo BD, Joyce R, Song X, et al. A single-cell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast. EMBO J. 2021;40(11):e107333. pmid:33950524
  25. 25. Chen Y, Pal B, Lindeman GJ, Visvader JE, Smyth GK. R code and downstream analysis objects for the scRNA-seq atlas of normal and tumorigenic human breast tissue. Sci Data. 2022;9(1):96. pmid:35322042
  26. 26. Türei D, Korcsmáros T, Saez-Rodriguez J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat Methods. 2016;13(12):966–7. pmid:27898060
  27. 27. Ramilowski JA, Goldberg T, Harshbarger J, Kloppmann E, Lizio M, Satagopam VP, et al. A draft network of ligand-receptor-mediated multicellular signalling in human. Nat Commun. 2015;6:7866. pmid:26198319
  28. 28. Vento-Tormo R, Efremova M, Botting RA, Turco MY, Vento-Tormo M, Meyer KB, et al. Single-cell reconstruction of the early maternal-fetal interface in humans. Nature. 2018;563(7731):347–53. pmid:30429548
  29. 29. Wallis WA. Compounding Probabilities from Independent Significance Tests. Econometrica. 1942;10(3/4):229.
  30. 30. Fisher RA. Statistical Methods for Research Workers. 4th edition. Oliver, Boyd, editors. Edinburgh and London; 1932.
  31. 31. Chen X, Song E. Turning foes to friends: targeting cancer-associated fibroblasts. Nat Rev Drug Discov. 2019;18(2):99–115. pmid:30470818
  32. 32. Gascard P, Tlsty TD. Carcinoma-associated fibroblasts: orchestrating the composition of malignancy. Genes Dev. 2016;30(9):1002–19. pmid:27151975
  33. 33. Hu D, Li Z, Zheng B, Lin X, Pan Y, Gong P, et al. Cancer-associated fibroblasts in breast cancer: Challenges and opportunities. Cancer Commun (Lond). 2022;42(5):401–34. pmid:35481621
  34. 34. Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25. pmid:26771021
  35. 35. Ren J, Chen Y, Kong W, Li Y, Lu F. Tumor protein D52 promotes breast cancer proliferation and migration via the long non-coding RNA NEAT1/microRNA-218-5p axis. Ann Transl Med. 2021;9(12):1008. pmid:34277808
  36. 36. Shehata M, Bièche I, Boutros R, Weidenhofer J, Fanayan S, Spalding L, et al. Nonredundant functions for tumor protein D52-like proteins support specific targeting of TPD52. Clin Cancer Res. 2008;14(16):5050–60. pmid:18698023
  37. 37. Heath AP, Ferretti V, Agrawal S, An M, Angelakos JC, Arya R, et al. The NCI Genomic Data Commons. Nat Genet. 2021;53(3):257–62.
  38. 38. Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2019;47(D1):D941–7. pmid:30371878
  39. 39. Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2(5):401–4. pmid:22588877
  40. 40. Ma Q, Zhang W, Wu K, Shi L. The roles of KRAS in cancer metabolism, tumor microenvironment and clinical therapy. Mol Cancer. 2025;24(1):14. pmid:39806421
  41. 41. Christenson JL, Trepel JB, Ali HY, Lee S, Eisner JR, Baskin-Bey ES, et al. Harnessing a Different Dependency: How to Identify and Target Androgen Receptor-Positive Versus Quadruple-Negative Breast Cancer. Horm Cancer. 2018;9(2):82–94. pmid:29340907
  42. 42. Gerratana L, Basile D, Buono G, De Placido S, Giuliano M, Minichillo S, et al. Androgen receptor in triple negative breast cancer: A potential target for the targetless subtype. Cancer Treat Rev. 2018;68:102–10. pmid:29940524
  43. 43. Li M, Kang S, Deng X, Li H, Zhao Y, Tang W, et al. Erianin inhibits the progression of triple-negative breast cancer by suppressing SRC-mediated cholesterol metabolism. Cancer Cell Int. 2024;24(1):166. pmid:38734640
  44. 44. Zhang W, Wang J, Liu C, Li Y, Sun C, Wu J, et al. Crosstalk and plasticity driving between cancer-associated fibroblasts and tumor microenvironment: significance of breast cancer metastasis. J Transl Med. 2023;21(1):827. pmid:37978384
  45. 45. Koltai T. Clusterin: a key player in cancer chemoresistance and its inhibition. Onco Targets Ther. 2014;7:447–56. pmid:24672247
  46. 46. Zhang D, Sun B, Zhao X, Cui Y, Xu S, Dong X, et al. Secreted CLU is associated with the initiation of triple-negative breast cancer. Cancer Biol Ther. 2012;13(5):321–9.
  47. 47. Pastena P, Perera H, Martinino A, Kartsonis W, Giovinazzo F. Unraveling Biomarker Signatures in Triple-Negative Breast Cancer: A Systematic Review for Targeted Approaches. Int J Mol Sci. 2024;25(5):2559. pmid:38473804
  48. 48. Oji Y, Tatsumi N, Fukuda M, Nakatsuka S-I, Aoyagi S, Hirata E, et al. The translation elongation factor eEF2 is a novel tumor-associated antigen overexpressed in various types of cancers. Int J Oncol. 2014;44(5):1461–9.
  49. 49. Jia X, Huang C, Liu F, Dong Z, Liu K. Elongation factor 2 in cancer: a promising therapeutic target in protein translation. Cell Mol Biol Lett. 2024;29(1):156. pmid:39707196
  50. 50. Song P, Yang F, Jin H, Wang X. The regulation of protein translation and its implications for cancer. Signal Transduct Target Ther. 2021;6(1):68. pmid:33597534
  51. 51. Wilcox N, Dumont M, González-Neira A, Carvalho S, Joly Beauparlant C, Crotti M, et al. Exome sequencing identifies breast cancer susceptibility genes and defines the contribution of coding variants to breast cancer risk. Nat Genet. 2023;55(9):1435–9. pmid:37592023
  52. 52. Hjazi A, Ghaffar E, Asghar W, Alauldeen Khalaf H, Ikram Ullah M, Mireya Romero-Parra R, et al. CDKN2B-AS1 as a novel therapeutic target in cancer: Mechanism and clinical perspective. Biochem Pharmacol. 2023;213:115627. pmid:37257723
  53. 53. Elwakeel E, Weigert A. Breast Cancer CAFs: Spectrum of Phenotypes and Promising Targeting Avenues. Int J Mol Sci. 2021;22(21):11636. pmid:34769066
  54. 54. Mao X, Xu J, Wang W, Liang C, Hua J, Liu J, et al. Crosstalk between cancer-associated fibroblasts and immune cells in the tumor microenvironment: new findings and future perspectives. Mol Cancer. 2021;20(1):131. pmid:34635121
  55. 55. Guo Z, Zhang H, Fu Y, Kuang J, Zhao B, Zhang L, et al. Cancer-associated fibroblasts induce growth and radioresistance of breast cancer cells through paracrine IL-6. Cell Death Discov. 2023;9(1):6. pmid:36635302
  56. 56. Ko DS, Kim SH, Park JY, Lee G, Kim HJ, Kim G, et al. Milk Fat Globule-EGF Factor 8 Contributes to Progression of Hepatocellular Carcinoma. Cancers. 2020;12(2):403.
  57. 57. Filippou PS, Karagiannis GS, Constantinidou A. Midkine (MDK) growth factor: a key player in cancer progression and a promising therapeutic target. Oncogene. 2020;39(10):2040–54. pmid:31801970
  58. 58. Verma S, Giagnocavo SD, Curtin MC, Arumugam M, Osburn-Staker SM, Wang G, et al. Zinc-alpha-2-glycoprotein Secreted by Triple-Negative Breast Cancer Promotes Peritumoral Fibrosis. Cancer Res Commun. 2024;4(7):1655–66. pmid:38888911
  59. 59. Wang Q, Yao J, Jin Q, Wang X, Zhu H, Huang F, et al. LAMP1 expression is associated with poor prognosis in breast cancer. Oncol Lett. 2017;14(4):4729–35. pmid:29085473
  60. 60. Binothman N, Aljadani M, Alghanem B, Refai MY, Rashid M, Al Tuwaijri A, et al. Identification of novel interacts partners of ADAR1 enzyme mediating the oncogenic process in aggressive breast cancer. Sci Rep. 2023;13(1):8341. pmid:37221310
  61. 61. Du W, Novin A, Liu Y, Afzal J, Liu S, Suhail Y, et al. Stable and oscillatory hypoxia differentially regulate invasibility of breast cancer associated fibroblasts. Mechanobiol Med. 2024;2(3):100070. pmid:40365532
  62. 62. Okuno K, Ikemura K, Okamoto R, Oki K, Watanabe A, Kuroda Y, et al. CAF-associated genes putatively representing distinct prognosis by in silico landscape of stromal components of colon cancer. PLoS One. 2024;19(4):e0299827. pmid:38557819
  63. 63. Jassal B, Matthews L, Viteri G, Gong C, Lorente P, Fabregat A, et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2020;48(D1):D498–503. pmid:31691815
  64. 64. Di Nanni N, Bersanelli M, Milanesi L, Mosca E. Network Diffusion Promotes the Integrative Analysis of Multiple Omics. Front Genet. 2020;11:106. pmid:32180795
  65. 65. Cowen L, Ideker T, Raphael BJ, Sharan R. Network propagation: a universal amplifier of genetic associations. Nat Rev Genet. 2017;18(9):551–62. pmid:28607512
  66. 66. Chen Y, Smyth G. Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues. figshare. 2022.
  67. 67. Finak G, McDavid A, Yajima M, Deng J, Gersuk V, Shalek AK, et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 2015;16:278. pmid:26653891
  68. 68. Hao Y, Hao S, Andersen-Nissen E, Mauck WM 3rd, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573-3587.e29. pmid:34062119
  69. 69. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J Roy Statist Soc B Statist Methodol. 1995;57(1):289–300.
  70. 70. Valdeolivas A, Turei D, Gabor A. OmnipathR: client for the OmniPath web service. 2019.
  71. 71. Dolgalev I. msigdbr: MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format. 2022. Available from: https://igordot.github.io/msigdbr/