CoFormerSurv: Collaborative transformer for multi-omics survival analysis

Gang Wen; Limin Li

doi:10.1371/journal.pcbi.1013875

Abstract

In the field of biomedicine, advances in high-throughput sequencing have generated vast amounts of high-dimensional multi-omics data. Survival analysis methods with multi-omics data can comprehensively uncover the heterogeneity and complexity of diseases from multiple perspectives, thereby improving prognostic predictions for patients, which is critical for developing personalized treatment strategies in precision medicine. Recently, Transformer architecture has emerged as a dominant paradigm in multiple domains. However, due to the inherent challenges in modeling right-censored data, it remains unclear how to effectively utilize Transformer architecture in multi-omics survival analysis to fully extract complementary information across different omics for improving survival prediction performance. In this work, we propose an innovative collaborative Transformer framework for multi-omics survival analysis, namely CoFormerSurv, with two consecutive Transformer architectures including an inter-omics Transformer and an inter-sample graph Transformer. The inter-omics Transformer learns multiple meaningful feature interactions by multi-head self-attention mechanism to capture and quantify complementary information across different omics, while the inter-sample graph Transformer integrates structural information from the fused multi-omics graph into the Transformer architecture, enabling more effective exploration of neighborhood relationships among samples. The two kinds of Transformer architectures can work collaboratively to generate more comprehensive multi-omics features for improving the Cox-PH model performance in survival analysis. Experimental results on multiple real-world datasets show that our proposed method outperforms both single-Transformer architectures and existing survival prediction models by simultaneously exploring complementary information from inter-omics and cross-sample perspectives.

Author summary

We propose CoFormerSurv, a collaborative Transformer framework that improves survival prediction with multi-omics data. CoFormerSurv method consists of two complementary components: an inter-omics Transformer that models cross-omics interactions, and an inter-sample graph Transformer that learns the neighborhood relationships among multi-omics samples. By integrating these two perspectives, our dual Transformer architecture enables more comprehensive feature learning and superior performance compared to existing approaches.

Citation: Wen G, Li L (2026) CoFormerSurv: Collaborative transformer for multi-omics survival analysis. PLoS Comput Biol 22(1): e1013875. https://doi.org/10.1371/journal.pcbi.1013875

Editor: Serdar Bozdag, University of North Texas, UNITED STATES OF AMERICA

Received: September 10, 2025; Accepted: December 25, 2025; Published: January 7, 2026

Copyright: © 2026 Wen, Li. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting information files. All data analyzed in this study are publicly accessible. The raw data are obtained from the TCGA and PCAWG datasets, available at https://www.cancer.gov/ccg/research/genome-sequencing/tcga and https://xena.ucsc.edu/, respectively. The preprocessed datasets generated for this study have been deposited in the Figshare repository and can be accessed via https://doi.org/10.6084/m9.figshare.30744665. The source codes are available at https://github.com/LiminLi-xjtu/CoFormerSurv.

Funding: This work was supported by the National Natural Science Foundation of China (Grant No. 12222115 and 92470106 to L.L.), website: http://www.nsfc.gov.cn/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Survival analysis uses multiple explanatory variables to predict the time until an event occurs and has always been a hot topic in the field of biomedical research. The major challenges [1] in survival analysis are the highly skewed nature of time-to-event data, in which some events occur significantly beyond the median, and censoring, which arises from lost follow-up or the termination of the study before the event of interest is observed. Clinically, survival analysis for cancer patients remains predominantly reliant on low-dimensional features (e.g., demographic factors such as age and sex, and tumor characteristics such as histologic grade and T/N/M stage) [2] to examine the effects of multiple predictors on survival outcomes. With the rapid advancement of high-throughput technology, the growing volumes of multi-omics data, such as gene expression and microRNA expression profiles, are increasingly being leveraged to predict clinical outcomes of patients. This provides new technical support and research perspectives for conducting more precise personalized prognostic prediction studies.

In traditional survival analysis, the nonparametric Kaplan-Meier (K-M) estimator, the semi-parametric Cox proportional hazards (Cox-PH) model, and the parametric accelerated failure time (AFT) model constitute three fundamental analytical approaches [3]. The K-M estimator directly estimates the survival function from survival data without making assumptions about the distribution of survival times or the form of the hazard function, but it cannot analyze or quantify the effects of explanatory variables on survival time. The Cox-PH model assumes constant hazard ratios between patients and requires no specific assumptions about the survival time distribution, which offers exceptional flexibility and broad applicability for analyzing complex survival data. The AFT model, compared to the Cox-PH model, relies on regression analysis to model covariate effects on log-survival time—a process that inherently requires distributional assumptions (e.g., Weibull or log-normal) [4]. In addition to the three aforementioned methods, traditional machine learning approaches have been successively introduced for time-to-event data analysis, such as Random Forests [5] and Support Vector Regression [6]. Recently, deep learning [7,8] has gained increasing prominence in survival analysis [9–12] due to its exceptional performance. For example, reference [10] combines deep neural networks with the AFT model to propose the deep survival analysis method DeepAFT. In contrast to DeepAFT method, DSM method [12] learns multiple latent survival distributions through hierarchical graphical models. By integrating these latent survival distributions with weighted approaches, DSM method can better accommodate the heterogeneity of survival data.

In the field of biomedicine, advances in high-throughput sequencing have generated vast amounts of high-dimensional omics data. Omics-based survival analysis methods can reveal the impact of molecular characteristics [13,14] on cancer prognosis, elucidating tumor mechanisms at the gene and pathway levels to provide evidence for personalized precision therapy. These methods can be broadly categorized into single-omics [15–17] and multi-omics [18,19] survival analysis methods. Single-omics survival analysis focuses on investigating the association between specific biomarkers (such as genomic data, transcriptomic data, or proteomic data) and patient prognosis. For example, Cox-nnet method [16] analyzes the relationship between the features of hidden layer nodes in neural networks and patient survival risk, uncovering key genes and biological pathways that significantly impact cancer prognosis, thereby revealing rich biological information. Considering the challenge of overfitting in deep learning models trained on limited high-dimensional gene expression data, reference [20] proposed VAECox, a two-stage transfer learning model that first pre-trains a VAE on multi-cancer RNA-seq data without survival labels and then transfers the learned weights to initialize a cancer-specific survival prediction model. Recently, Graph Convolutional Networks (GCNs) have achieved remarkable success in various fields, including survival prediction, due to their ability to integrate both node attributes and graph structure information. For example, AGGSurv method [21] first generates diverse sparse graph structures by randomly sampling feature subsets from high-dimensional RNA-seq data and then learns a ridge regression-Cox model to integrate predictions from these multiple GCNs, ultimately improving survival prediction performance.

Multi-omics survival analysis method integrates information from different sources to help analyze the heterogeneity and complexity of diseases from multiple dimensions, enabling more accurate predictions of patient outcomes, which is significant for advancing precision medicine. Multi-omics survival analysis methods can be divided into two categories: feature fusion-based methods [22,23] and graph fusion-based methods [24]. Feature fusion-based methods effectively capture complementary information across multiple omics by learning inter-omics feature interactions, significantly improving predictive performance for patient survival outcomes. For example, HFBSurv method [23] deeply mines omics-specific information to explore intra-omics feature dependencies and cross-omics information to quantify inter-omics feature interactions through omics-specific and cross-omics attentional factorized bilinear modules, thereby enhancing the accuracy of survival predictions. Compared to feature fusion strategies, graph fusion-based methods tend to share and propagate complementary neighborhood information across different omics. For example, GCGCN method [24] can more accurately reveal neighborhood relationships among samples for survival analysis by integrating multiple sample similarity matrices from different omics [25]. Building upon this, reference [26] proposed FGCNSurv method, which simultaneously fuses features and graphs within GCNs, enabling more comprehensive exploration of complementary relationships among multi-omics data for survival prediction.

Recently, Transformer architecture [27,28] has emerged as a dominant paradigm in multiple domains. While Transformer architecture has demonstrated promising results in survival analysis [29–31], it remains unclear how to effectively utilize Transformer architecture in multi-omics survival analysis to fully extract complementary information across different omics for improving survival prediction performance. In this work, we propose a collaborative Transformer framework for multi-omics survival analysis, namely CoFormerSurv, including two complementary Transformer architectures: an inter-omics Transformer and an inter-sample graph Transformer. The inter-omics Transformer learns multiple meaningful cross-omics features by multi-head self-attention mechanism. The inter-sample graph Transformer encodes the spatial information of the fused graph from multiple omics into the Transformer architecture to more effectively model neighborhood relations among multi-omics samples. By integrating the inter-omics Transformer and inter-sample graph Transformer, the collaborative Transformer can generate more informative and discriminative multi-omics features for Cox-PH-based survival analysis. Evaluations on multiple real-world datasets demonstrate that our proposed collaborative Transformer outperforms both single-Transformer architectures and existing survival prediction methods by jointly leveraging inter-omics and inter-sample perspectives.

Results

Overview of CoFormerSurv

Multi-omics survival data are typically represented as , where x_i denotes multi-omics features of the i-th patient from v distinct omics (e.g., gene expression and microRNA expression), O_i signifies the observed time-to-event and is a binary indicator of censoring. The observed time O_i depends on whether the event of death occurs before censoring. indicates that death occurred prior to censoring, with the observed time O_i representing the true survival time T_i. Conversely, implies that O_i is equal to the censoring time C_i, meaning the i-th patient’s follow-up ended without an observed death.

Multi-omics survival analysis methods, compared to single-omics approaches, enable more comprehensive characterization of the intricate nature and diversity of diseases from multiple perspectives, demonstrating superior potential for survival prediction. Though current multi-omics survival methods can achieve excellent performance by fusing information from different sources, it remains unclear how to effectively utilize Transformer architecture, which has become a dominant choice in many domains, to fully exploit cross-omics complementary information for multi-omics survival analysis. To tackle this issue, we propose a collaborative Transformer framework for multi-omics survival analysis, namely CoFormerSurv. CoFormerSurv method primarily includes three key components: an inter-omics Transformer, an inter-sample graph Transformer, and a Cox proportional hazards model, with the two kinds of Transformer architectures forming our collaborative Transformer framework. The inter-omics Transformer employs multi-head self-attention mechanism to identify high-order interaction features across multi-omics data. The inter-sample graph Transformer encodes structural information of the fused graph from multiple omics into the Transformer architecture to more effectively explore neighborhood relations among multi-omics samples. By aggregating interaction features extracted from the inter-omics Transformer with neighborhood relations learned from the inter-sample graph Transformer, the collaborative Transformer can generate more informative and discriminative multi-omics features for survival analysis with the Cox proportional hazards model. Fig 1 depicts the overall structure of our CoFormerSurv model. We provide a detailed description of CoFormerSurv method in the materials and methods section.

Download:

Fig 1. Architectural overview of the proposed CoFormerSurv model.

The inter-omics Transformer employs multi-head self-attention mechanism to identify high-order interactive features across multi-omics data. The inter-sample graph Transformer encodes structural information of the fused graph from multiple omics into the Transformer architecture to aggregate multi-omics features extracted from the inter-omics Transformer for learning more expressive sample embeddings. The collaborative transformer integrates the inter-omics Transformer and the inter-sample graph Transformer to generate more informative and discriminative multi-omics features for survival analysis with the Cox proportional hazards model.

https://doi.org/10.1371/journal.pcbi.1013875.g001

Datasets and preprocessing

The Cancer Genome Atlas (TCGA) [32] is a comprehensive public resource that offers multi-omics data from over 11,000 patients across 33 cancer types. In this study, we evaluate the performance of CoFormerSurv method based on RNASeq-derived gene expression and miRNA-Seq-derived microRNA expression data from eight common cancer types. These eight cancer types from TCGA—including breast invasive carcinoma (BRCA), lung adenocarcinoma (LUAD), urothelial bladder carcinoma (BLCA), head and neck squamous cell carcinoma (HNSC), liver hepatocellular carcinoma (LIHC), uterine corpus endometrial carcinoma (UCEC), ovarian serous cystadenocarcinoma (OV) and lung squamous cell carcinoma (LUSC)-provide relatively sufficient sample sizes, which facilitates more reliable model training and validation. The preprocessing pipeline for gene expression and microRNA expression data from the above cancer datasets involves the following steps: First, gene/microRNA features with missing values exceeding 10% are removed, and the remaining missing values are imputed using the median strategy; Next, gene/microRNA expression levels undergo log transformation, followed by the removal of low-variance noise features; Finally, standardization is performed to normalize each gene/microRNA feature, ensuring a mean of 0 and a standard deviation of 1. After the above processing, each final cancer dataset comprises 6000 gene features and 600 miRNA features.

We further demonstrate the effectiveness of CoFormerSurv method for multi-omics survival analysis tasks on the Pan-Cancer Analysis of Whole Genomes (PCAWG) dataset. The PCAWG dataset, which comprises whole-genome sequencing data from 2,658 cancer patients collected through the International Cancer Genome Consortium (ICGC) and TCGA projects and encompasses 38 distinct tumor types, is available at https://xena.ucsc.edu/. For consistency, we applied the same preprocessing pipeline to both gene expression and microRNA expression data from the PCAWG dataset.

Evaluation metrics

In this study, we systematically evaluated the survival prediction performance of CoFormerSurv method using two key metrics: the concordance index (C-index) and the area under the receiver operating characteristic curve (AUC). As a core evaluation metric in survival analysis, the C-index effectively measures the degree of agreement between model predictions and actual clinical observations. This metric operates on the principle that for any pair of patients, if the individual with longer actual survival time also receives a correspondingly longer predicted survival time, the prediction is considered consistent. It should be specifically noted that paired samples will be excluded from calculation under either of the following two circumstances: when the data of the patient with shorter survival time is censored, or when both patients’ data are censored observations. The C-index is mathematically defined by the following formula:

(1)

Another evaluation metric, AUC, effectively assesses the accuracy of risk ranking at different event time points, with its mathematical expression being:

(2)

where Y represents the set of observed event time points in the study data, N_num is the total number of all valid comparable sample pairs and is a binary indicator function. The C-index and AUC values are both bounded between 0 and 1 where 0.5 indicates predictive performance equivalent to random chance, and 1 represents perfect discrimination. The higher the values of these two metrics, the stronger the predictive ability of the model.

Experimental setting

To comprehensively assess the performance of the proposed CoFormerSurv method, we uses a repeated holdout cross-validation strategy. The specific implementation process is as follows: First, the dataset is divided into two subsets: 80% designated for training and 20% for testing. Subsequently, a predictive model is constructed based on the training data, and the performance of the model is assessed on the testing set by calculating the C-index and AUC values. To ensure robust validation, this evaluation procedure is systematically replicated 50 times using independent random partitions, with final performance reported as mean values along with their standard deviations. Note that the training process was carried out using the Adam optimization algorithm, an adaptive stochastic gradient descent method, with a fixed learning rate of 2e-4. And during the training of CoFormerSurv model, we used cross-validation to determine the neighborhood size K for the graphs and .

Performance comparison with existing methods

To evaluate the performance of CoFormerSurv method in multi-omics survival analysis, we conducted comprehensive comparisons with existing state-of-the-art methods, including single-omics approaches (RSF [5], DeepSurv [9], DeepHit [11], and AGGSurv [21]) and multi-omics approaches (HFBSurv [23], SurvCNN [33], GCGCN [24], and FGCNSurv [26]). To further validate the effectiveness of our collaborative Transformer framework in extracting complementary information from multi-omics data, we developed GANMOSurv and SATMOSurv methods based on Graph Attention Network [34](GAN) and Structure-Aware Graph Transformer [35] (SAT), respectively, for patient survival prediction. Specifically, GANMOSurv and SATMOSurv methods employ graph attention network and structure-aware graph Transformer to extract features from single-omics data, and integrate these features from different omics through a neural network to learn semantically rich multi-omics representations. Tables 1 and 2 display the C-index and AUC values of various methods on the TCGA and PCAWG datasets. The upper section of each table displays the results obtained from single-omics methods on gene/microRNA expression data, while the lower section presents the corresponding results of multi-omics methods. The last columns of Tables 1 and 2 summarize the average C-index and AUC values, respectively, for each method across all cancer datasets.

Download:

Table 1. Comparison of the C-index performance of CoFormerSurv method with existing methods.

The upper section displays the C-index values obtained from single-omics methods on gene/microRNA expression data, while the lower section presents the corresponding results of multi-omics methods. The rightmost column presents an overview of the performance, showing the average C-index value for each method across all cancer datasets.

https://doi.org/10.1371/journal.pcbi.1013875.t001

Download:

Table 2. The AUC values of CoFormerSurv method and existing methods.

The upper section displays the AUC values obtained from single-omics methods on gene/microRNA expression data, while the lower section presents the corresponding results of multi-omics methods. The average AUC achieved by each method across all cancer datasets is summarized in the final column.

https://doi.org/10.1371/journal.pcbi.1013875.t002

From the results, we observe that deep learning-based survival analysis methods exhibit markedly superior performance to traditional machine learning approaches. As an illustration, the average C-index value of DeepSurv method on gene/microRNA expression data across all cancer datasets is 0.643/0.638, representing an improvement of 8.8%/8.1% compared to RSF method. Meanwhile, it is of note that multi-omics survival analysis methods (except SurvCNN) achieve more satisfactory performance than single-omics approaches in terms of C-index and AUC values, which strongly demonstrates the feasibility and effectiveness of multi-omics data integration for survival prediction. In SurvCNN method, gene and microRNA expression data are converted into an image format, allowing feature extraction via CNNs to predict the survival distribution of cancer patients. We consider that the transformation implicitly assumes the relationships among genes/microRNAs can be characterized by the spatial proximity in the images, which may not adequately capture the complex feature relationships present in the original data. Furthermore, GANMOSurv and SATMOSurv methods, which are based on graph attention mechanisms, provide performance comparable or superior to HFBSurv method by integrating multi-omics data with a simple network. Moreover, FGCNSurv method can learn more comprehensive multi-omics feature representations by dually fused graph convolutional network, outperforming GANMOSurv, SATMOSurv and GCGCN methods. More importantly, CoFormerSurv method achieves superior performance compared to all other methods. Especially for LUAD and LIHC datasets, CoFormerSurv method has a significant improvement over the suboptimal FGCNSurv method. For example, CoFormerSurv method achieves mean C-index values of 0.644 and 0.714 on LUAD and LIHC datasets, respectively, surpassing FGCNSurv method by 2.1% and 2%. These results demonstrate that CoFormerSurv method could effectively utilize the collaborative Transformer to comprehensively extract complementary information across different omics for improving multi-omics survival analysis.

To further evaluate the performance of CoFormerSurv method, we introduced Kaplan-Meier (K-M) curves to validate its effectiveness in risk stratification of cancer patients. Specifically, we first stratified the BRCA test set samples into high-risk and low-risk groups based on the model’s predictions. Subsequently, we plotted Kaplan-Meier survival curves and performed a log-rank test to analyze the statistical differences in survival distributions between the two groups. The more significant difference in survival curves between the risk groups, the stronger the model’s discriminative ability in risk stratification. Fig 2 presents the Kaplan-Meier survival curves for the high-risk and low-risk groups predicted by various Cox-PH model-based methods on BRCA dataset, with corresponding log-rank test p-values. The results suggest that deep learning-based survival analysis methods exhibit superior discriminative capability in stratifying low-risk and high-risk groups compared to traditional machine learning approaches. For example, the log-rank test p-values generated by DeepSurv method (1.19e-07 for gene expression and 1.93e-06 for microRNA expression) demonstrated more significant statistical differences compared to those obtained by RSF method (2.01e-02 and 5.15e-04, respectively). Moreover, compared to single-omics approaches, multi-omics methods yield statistically more significant analytical results. Furthermore, in the comparison among multi-omics methods, FGCNSurv and CoFormerSurv methods achieve the best and second-best performance, respectively, at an extremely significant statistical level (1.83e-12 and 1.89e-12), significantly outperforming HFBSurv, GCGCN, GANMOSurv and SATMOSurv methods. This demonstrates that CoFormerSurv, with collaborative Transformer architecture, and FGCNSurv, with dually fused graph convolutional network, can fully leverage the complementary information across different omics data, thereby efficiently identifying high- and low-risk patient groups with highly significant survival differences.

Download:

Fig 2. Kaplan–Meier survival curves for high-risk and low-risk groups predicted by various methods with Cox-PH model on BRCA dataset, with corresponding log-rank test p-values.

https://doi.org/10.1371/journal.pcbi.1013875.g002

Ablation study

We conducted ablation studies across all cancer datasets to identify the sources of performance improvements, thereby validating the effectiveness of the collaborative Transformer architecture in multi-omics survival analysis tasks. We constructed the following model variants by ablating key components to systematically analyze each module’s impact on model performance:

SGT: using single-omics features and graph instead of multi-omics features Z^c and graph G_A to construct the inter-sample graph Transformer for learning more expressive single-omics features in survival analysis;
MGT: directly combining single-omics features Z⁽¹⁾ and Z⁽²⁾ instead of multi-omics combinatorial features Z^c to construct the inter-sample graph Transformer for learning more expressive multi-omics features in survival analysis, without the inter-omics Transformer module;
CrossT: using the combinatorial features Z^(c) from the inter-omics Transformer for survival analysis, without the inter-sample graph module;
CrossT-SEGCN: using GCN with spatial encoding to aggregate multi-omics features extracted from the inter-omics Transformer for learning more expressive sample embeddings, without self-attention module in inter-sample graph Transformer;
CrossT-SA: using self-attention mechanism to aggregate multi-omics features from the inter-omics Transformer for learning more expressive sample embeddings, without incorporating the topological structure of the multi-omics fused graph in inter-sample graph Transformer.

Table 3 reports the C-index values of CoFormerSurv method and its variants obtained by ablating key components on all cancer datasets. In this table, SGT⁽¹⁾ and SGT⁽²⁾ represent SGT methods on gene expression data and microRNA expression data, respectively. The last column summarizes the average C-index values of CoFormerSurv model and corresponding variants from ablation studies across all cancer datasets. From the results presented in Table 3, we can observe that MGT (multi-omics inter-sample graph Transformer) method outperforms SGT (single-omics inter-sample graph Transformer) method on gene/microRNA expression data, which clearly demonstrates the effectiveness of multi-omics data integration. Meanwhile, these results suggest that CoFormerSurv method is superior to its variants, MGT and CrossT. Specifically, CoFormerSurv method achieves an average C-index value of 0.679 across all cancer datasets, corresponding to performance gains of 1.49% and 1.04% compared to MGT and CrossT, respectively. This improvement highlights the critical role of the inter-omics Transformer and inter-sample graph Transformer in enhancing the performance of CoFormerSurv model. Moreover, it is of note that CoFormerSurv method exhibit better performance than CrossT-SEGCN and CrossT-SA. Compared to these two variants, CoFormerSurv method can more accurately capture inter-sample neighborhood relationships by incorporating the structural information of the multi-omics fused graph into the self-attention mechanism to aggregate interaction features from the inter-omics Transformer and thereby learn more expressive embeddings. In summary, CoFormerSurv method can effectively aggregate interaction features extracted from the inter-omics Transformer with neighborhood relations learned from the inter-sample graph Transformer in the collaborative Transformer framework, to generate more informative and discriminative multi-omics features for survival analysis.

Download:

Table 3. The C-index values of CoFormerSurv method and its variants obtained by ablating key components.

SGT⁽¹⁾ and SGT⁽²⁾ represent SGT methods on gene expression data and microRNA expression data, respectively. The last column summarizes the average C-index values of CoFormerSurv model and corresponding variants across all cancer datasets.

https://doi.org/10.1371/journal.pcbi.1013875.t003

Feature analysis

We further investigate the hidden nodes of CoFormerSurv model to identify molecular markers that significantly impact cancer patient survival. The specific procedure for molecular marker selection is as follows: First, we computed the Pearson correlation coefficients between the raw gene expression data and the feature representations from each hidden node in the output layer of the feature extraction network. Subsequently, we selected the top-ranked genes for each hidden node by their absolute correlation value. Fig 3 presents the Pearson correlation coefficients between feature representations at different hidden nodes and expression levels of selected gene markers in breast cancer training samples.

Download:

Fig 3. Pearson correlation coefficients between the expression levels of the identified genes and the feature representations from each hidden node in the output layer of the feature extraction network.

https://doi.org/10.1371/journal.pcbi.1013875.g003

Many of the identified gene signatures exhibit oncogenic relevance to breast carcinogenesis and significantly correlate with patient survival outcomes. The EDIL3 gene exerts multifaceted pro-tumorigenic functions within the tumor microenvironment. The EDIL3 gene [36] is reported to enhances cellular invasion and accelerates lung metastasis by activating the integrin-FAK signaling pathway in breast cancer cells. The CCR4 gene, a key chemokine receptor involved in modulating immune homeostasis, is closely associated with tumor growth and metastasis. Research [37] has shown that CCR4 gene expression varies significantly among human breast cancer cell lines, with particularly marked overexpression in those exhibiting high metastatic potential. The LAMP3 gene encodes a lysosomal membrane protein involved in lysosomal function and autophagy-related processes. In breast cancer cell lines, LAMP3 gene [38] exhibits differential expression, with its expression levels induced in a manner dependent on oxygen concentration. FOXA1 gene [39] binds to condensed chromatin, facilitating the recruitment of other transcription factors, and plays a pivotal role in breast cancer pathogenesis. FOXA1 is also frequently mutated in ER+ breast cancer. LINC00504 gene serves as a potentially relevant lncRNA in breast oncogenesis. Patients with higher median expression of LINC00504 exhibited a higher survival probability in the basal-like subtype [40]. The B3GNT5 gene plays an essential role in synthesizing lacto- and neolacto-series glycans on glycolipids, and these carbohydrate structures are essential for embryonic development. B3GNT5 gene expression was significantly upregulated in basal-subtype breast cancer cell lines and its elevated expression levels show significant correlations with larger tumor size and poorer survival rates [41]. The SCN4B gene encodes the protein, which not only serves as a important regulatory subunit of sodium channels but also plays a key role in suppressing cancer metastasis. Research [42] has shown that low expression of the SCN4B gene in breast cancer biopsy tissues is significantly associated with the occurrence of high-grade primary tumors and metastatic tumors.

Performance of CoFormerSurv method on different multi-omics data types

We further evaluated CoFormerSurv method on different types of multi-omics data, such as gene expression coupled with DNA methylation, as well as microRNA expression with DNA methylation. We preprocessed the DNA methylation data using the same strategy mentioned above, removing noise-prone variables with low inter-sample variability to balance information retention and computational efficiency. By reducing feature dimensionality, downstream analyses can focus on the most informative biological signals while preventing overfitting. We ultimately retained the top 6000 high-variance features — a common practice in the field — which effectively balances information preservation with model robustness and is well-suited for downstream multi-omics integration analyses. Due to insufficient sample size in DNA methylation training data for ovarian cancer (OV), which could not meet the requirements for statistical analysis, this cancer type was excluded from the study. The final experiments were conducted on multi-omics datasets from seven different cancer types, with detailed evaluation results presented in Figs 4 and 5 and Tables 4 and 5.

Download:

Fig 4. The C-index values of comparative methods with gene expression and/or DNA methylation data.

RSF, DeepSurv, DeepHit, and AGGSurv are single-omics methods with gene expression or DNA methylation data. The remaining methods are multi-omics approaches, utilizing both DNA methylation and gene expression data.

https://doi.org/10.1371/journal.pcbi.1013875.g004

Download:

Fig 5. The C-index values of various methods with DNA methylation and/or microRNA expression data.

RSF, DeepSurv, DeepHit, and AGGSurv are single-omics methods with DNA methylation or microRNA expression data. The remaining methods are multi-omics approaches, utilizing both DNA methylation and microRNA expression data.

https://doi.org/10.1371/journal.pcbi.1013875.g005

Download:

Table 4. The AUC values of CoFormerSurv method and existing methods with gene expression and/or DNA methylation data.

The upper section displays the AUC values from single-omics methods using either gene expression or DNA methylation data, while the lower section presents the corresponding results of multi-omics methods.

https://doi.org/10.1371/journal.pcbi.1013875.t004

Download:

Table 5. The AUC values of CoFormerSurv method and existing methods with DNA methylation and/or microRNA expression data.

The upper section displays the AUC values from single-omics methods using either DNA methylation or microRNA expression data, while the lower section presents the corresponding results of multi-omics methods.

https://doi.org/10.1371/journal.pcbi.1013875.t005

The experimental results demonstrate that multi-omics methods significantly outperform single-omics methods across various types of datasets. Specifically, in multi-omics data with gene expression and DNA methylation, FGCNSurv method achieves superior performance with an average C-index of 0.665 across all cancer datasets. This represents an improvement of 3.7% over DeepSurv method using gene expression data alone, and an improvement of 7.1% when trained solely on DNA methylation data. Notably, CoFormerSurv method exhibits exceptional performance among multi-omics methods. For example, CoFormerSurv method achieves average C-index values of 0.681 and 0.660 across all cancer datasets in different types of multi-omics data, representing improvements of 4.2% and 4.0% over HFBSurv method. Additionally, further analysis reveals notable performance variations of CoFormerSurv method across different types of multi-omics data. Compared to multi-omics data that includes DNA methylation and microRNA expression, this method demonstrates superior performance in multi-omics data that includes gene expression and DNA methylation. Specifically, the improvement of CoFormerSurv over FGCNSurv is not particularly pronounced in multi-omics data with DNA methylation and microRNA expression. In contrast, with data that includes gene expression and DNA methylation, CoFormerSurv outperforms the second-best FGCNSurv method, achieving an average C-index improvement of 2.6% across all cancer datasets. In summary, these results indicate that the CoFormerSurv method effectively leverages collaborative Transformer to comprehensively extract complementary information across different omics, thereby enhancing multi-omics survival analysis.

Discussion

Multi-omics data encompasses multidimensional molecular information, including genes, microRNA, and DNA methylation. The integration of multi-omics data enables a systematic exploration of the interactions and functional mechanisms among biological components, providing robust technical support for cancer subtype identification, drug target discovery, and optimization of clinical decision-making systems.

Multi-omics-based survival analysis methods can more comprehensively uncover the heterogeneity and complexity of diseases, which is of significant importance for the advancement of precision medicine. In this work, we propose CoFormerSurv method, an innovative collaborative Transformer framework for multi-omics survival analysis. CoFormerSurv method includes an inter-omics Transformer, an inter-sample graph Transformer, and a Cox proportional hazards model, with the two kinds of Transformer architectures working together to form our collaborative Transformer framework. CoFormerSurv method, on one hand, identifies higher-order interaction features in multi-omics data through an inter-omics Transformer to quantify the complex relationships within the data; on the other hand, it encodes structural information from the fused multi-omics graph into the Transformer architecture via an inter-sample graph Transformer to learn more expressive sample embeddings. CoFormerSurv method can effectively aggregate interaction features extracted from the inter-omics Transformer with neighborhood relations learned from the inter-sample graph Transformer in the collaborative Transformer framework, to generate more informative and discriminative multi-omics features for survival analysis with the Cox proportional hazards model. Compared with single-Transformer approaches and state-of-the-art alternatives, our collaborative Transformer achieves superior survival prediction performance on multiple real-world datasets, by simultaneously exploring complementary information from both inter-omics and cross-sample perspective. Furthermore, the majority of key genes identified through hidden-layer feature analysis show significant correlations with cancer patient survival rates.

Although CoFormerSurv method has made significant progress in multi-omics survival analysis, there remains vast potential for its further optimization and exploration. First, the proportional hazards assumption in CoFormerSurv method often does not align with real-world scenarios. Enabling the model to directly learn the probability distribution of patient survival time may potentially yield further performance improvements. Second, CoFormerSurv method could integrate multi-modal data, such as genomic and histopathological images, to achieve a more comprehensive and accurate prediction of patient survival outcomes. Additionally, cancer subtype classification with multi-omics data has recently garnered widespread attention in the field of oncology research. In the future, we will explore how to leverage the collaborative Transformer architecture for multi-omics-based cancer subtype classification, thereby advancing the precision of molecular classification and the development of personalized medicine.

Materials and methods

Inter-omics transformer

The essence of multi-omics integration is to uncover cross-omics interactions to fully exploit their complementary information for downstream tasks. For example, contrastive learning [43] can identify discriminative patterns to improve survival prediction [44] by capturing notable cross-correlations across different omics. In this work, we use a Transformer architecture [27], namely inter-omics Transformer, to learn multiple meaningful feature interactions among multiple omics by multi-head self-attention mechanism. Transformer architecture with self-attention mechanism has demonstrated remarkable success across multiple domains, including computer vision tasks such as image segmentation [45] and classification [46], as well as machine translation [27] in natural language processing. The core idea of the self-attention mechanism is to dynamically learn a set of key weights that allocate importance to different parts of the input data. This operational principle exhibits intrinsic similarities with human visual cognition: when processing complex visual information, the human brain instinctively suppresses irrelevant background noise and focuses attentional resources on key areas relevant to the current task. This biomimetic attention allocation mechanism enables the model to autonomously capture the most discriminative features within the data, thereby enhancing the efficiency and accuracy of information processing.

The inter-omics Transformer enables the determination of which key features from different omics can be fused to generate meaningful combinatorial features for survival analysis. Following the architectural approach proposed in [47], we use the same matrix for keys and queries in the inter-omics Transformer architecture to reduce the number of learnable parameters. Specifically, for pre-processed gene expression data X⁽¹⁾ and microRNA expression data X⁽²⁾, we first project them into a latent space through a single-layer fully-connected network, obtaining compact low-dimensional feature representations . Subsequently, we derive query and value representations for gene and microRNA via learnable parameter matrices as below:

We then compute the cross-omics feature representations through an attention-weighted aggregation of the value vectors . Specifically, for each omics type, the contextualized feature is derived as:

(3)

where the attention weights are obtained by softmax normalization:

(4)

Here, these attention weights are computed using scaled dot-product operation, defined by the inner product and scaled by d^c (the dimensionality of ), effectively capturing the correlation among different omics features.

To capture diverse and meaningful feature interactions, we employ a multi-head learning strategy to independently extract distinct combinatorial features, denoted as and . Subsequently, we use a fully connected network to compress these different combinatorial features into an information-rich multi-omics representation , formulated as:

(5)

where represents the learnable parameter matrix, is the activation function, and denotes matrix concatenation along the feature dimension.

Inter-sample graph transformer

Although the inter-omics Transformer effectively integrates multi-omics feature information for individual patients, it does not fully account for the potential relationship among patient populations. The inter-sample graph Transformer encodes structural information of the fused graph from multiple omics into the Transformer architecture [28,35] to more effectively explore neighborhood relations among multi-omics samples. By aggregating interaction features extracted from the inter-omics Transformer with neighborhood relations learned from the inter-sample graph Transformer, our proposed collaborative Transformer framework can learn more expressive sample embeddings for survival analysis. We first present the Transformer without encoding the graph structure and then describe how these two Transformer architectures are integrated to work collaboratively. The core attention mechanism of the Transformer is defined as:

(6)

where and . Here denote query and value representations, represents multi-omics features that incorporate node-level information, d corresponds to the dimension of , and are learnable parameter matrices.

The output of the Transformer is permutation-invariant in the input data and ignores the adjacency relations among samples. Traditional GCN-based approaches typically assign equal or pre-defined weights to all neighboring nodes, relying on a static aggregation mechanism that struggles to adaptively differentiate the importance variations between nodes. To address the aforementioned issues, we incorporate the topological structure of the multi-omics fused graph into the transformer architecture. This design enables the model to dynamically focus on semantically similar samples and effectively filter out irrelevant local connections, thereby overcoming the limitations of fixed aggregation schemes. We employ the following strategy to construct a fused graph from multiple omics. Firstly, given gene expression data X⁽¹⁾, we construct a K-NN graph and use the exponential similarity kernel to define the adjacency matrix as follows:

(7)

Here represents the Euclidean distance metric and denotes the set of k-nearest neighbors for patient i. is introduced to normalize pairwise distances and is empirically set as the median of all squared Euclidean distances across patients. The hyperparameter is configured as 0.3/0.2 for the gene/microRNA graph to adjust the scaling. Similarly, we obtain the K-NN graph based on microRNA expression data X⁽²⁾, with A⁽²⁾ representing its adjacency matrix. Then we fuse the two graphs by taking the union of their edges and averaging of their adjacency matrices, resulting in a unified graph G_A with adjacent matrix A. To more effectively characterize the topological structure of the multi-omics fused graph, we perform spectral transformation on the adjacency matrix A to obtain the corresponding convolution matrix , as detailed below:

(8)

where and is defined as a diagonal matrix with entries with .

Subsequently, we encode the spatial information between samples in the multi-omics fused graph G_A into the traditional Transformer to bias the attention scores. More precisely, we enhance the attention kernel using the corresponding convolution matrix on the fused graph G_A as follows:

(9)

where N_i denotes the set of neighboring patients for patient i in the fused graph G_A. We leverage the calibrated attention weights to aggregate the value representations V, obtaining multi-omics features Z that incorporate node-level information. These features are then integrated through a fully connected network to learn more expressive representations of the samples, formulated as:

(10)

where W₁ and f₁ denote the learnable parameter matrix and activation function, respectively.

Survival analysis with Cox-PH model

Survival analysis focuses on modeling the distribution of survival time T. The statistical distribution characteristics of survival time T can be fully described by its corresponding survival function S(t) and hazard function . S(t) is defined as the probability that an individual survives beyond time t:

where f(s) denotes the probability density function of the survival time. The hazard function represents the instantaneous event occurrence rate at time t, conditional on survival up to that time. Formally, it is defined as:

For a specified parametric form of the survival time distribution, we can estimate the probability distribution of patients’ survival times by maximizing the complete likelihood function for both censored and uncensored observations, as follows:

The Cox-PH model [48] formulates the hazard function as a multiplicative relationship between a baseline hazard and covariate effects:

where the baseline hazard function has no restrictions and denotes the coefficient vector to be estimated. Due to its flexibility and interpretability, the Cox-PH model has become one of the most widely used methods in survival analysis.

Based on the more expressive representations , the loss of our multi-omics survival analysis with Cox-PH model is

(11)

We minimize the loss function to learn the parameters of collaborative Transformer and of the Cox-PH model. Once obtaining the parameter estimates, the risk function based on patients’ multi-omics data can be further estimated through the Breslow estimator [49].

In summary, to construct the CoFormerSurv model for multi-omics survival analysis, we need to train four core components including feature extraction layer to generate compact, low-dimensional feature representations from single-omics data, inter-omics Transformer to identify multiple meaningful feature interactions across multi-omics data, inter-sample graph Transformer to encode the structural information of the fused graph from multiple omics into the Transformer architecture for aggregating multi-omics features extracted from the inter-omics Transformer, and Cox-PH model for the final survival analysis.

Supporting information

S1 Text. Fig A illustrates the variation in C-index values of the CoFormerSurv method with the hyperparameter K (the neighborhood size for graph construction).

Fig B presents an overview of the overall architecture of the CoFormerSurv model for integrating three omics data types. Table A shows a comparison of the time and space complexity across different methods, including CoFormerSurv. Table B compares the C-index values of the CoFormerSurv method and existing methods with gene expression and/or copy number variation data. Table C displays the C-index values of various methods on three types of omics data including gene expression, microRNA expression and DNA methylation. Table D reports the C-index values of the CoFormerSurv method across different dimensionalities for the feature representation z. Tables E–F list the p-values from significance tests for the C-index and AUC of the CoFormerSurv method and existing state-of-the-art methods on gene expression and/or microRNA expression data.

https://doi.org/10.1371/journal.pcbi.1013875.s001

(PDF)

References

1. Dey T, Lipsitz SR, Cooper Z, Trinh Q-D, Krzywinski M, Altman N. Survival analysis-time-to-event data and censoring. Nat Methods. 2022;19(8):906–8. pmid:35927476
- View Article
- PubMed/NCBI
- Google Scholar
2. Louis DN, Perry A, Reifenberger G, von Deimling A, Figarella-Branger D, Cavenee WK, et al. The 2016 World Health Organization classification of tumors of the central nervous system: a summary. Acta Neuropathol. 2016;131(6):803–20. pmid:27157931
- View Article
- PubMed/NCBI
- Google Scholar
3. Aalen O, Borgan O, Gjessing H. Survival and event history analysis: a process point of view. Springer Science & Business Media; 2008.
4. Lee ET, Wang J. Statistical methods for survival data analysis. John Wiley & Sons; 2003.
5. Ishwaran H, Gerds TA, Kogalur UB, Moore RD, Gange SJ, Lau BM. Random survival forests for competing risks. Biostatistics. 2014;15(4):757–73. pmid:24728979
- View Article
- PubMed/NCBI
- Google Scholar
6. Khan FM, Zubek VB. Support vector regression for censored data (SVRc): a novel tool for survival analysis. In: 2008 Eighth IEEE International Conference on Data Mining; 2008. p. 863–8.
7. Zhang C, Cai Y, Lin G, Shen C. Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. p. 12203–13.
8. Kenton JDMWC, Toutanova LK. Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT. vol. 1. Minneapolis, Minnesota; 2019. p. 2.
9. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol. 2018;18(1):24. pmid:29482517
- View Article
- PubMed/NCBI
- Google Scholar
10. Norman PA, Li W, Jiang W, Chen BE. deepAFT: A nonlinear accelerated failure time model with artificial neural network. Stat Med. 2024;43(19):3689–701. pmid:38894557
- View Article
- PubMed/NCBI
- Google Scholar
11. Lee C, Zame W, Yoon J, Van der Schaar M. DeepHit: a deep learning approach to survival analysis with competing risks. AAAI. 2018;32(1).
- View Article
- Google Scholar
12. Nagpal C, Li X, Dubrawski A. Deep survival machines: fully parametric survival regression and representation learning for censored data with competing risks. IEEE J Biomed Health Inform. 2021;25(8):3163–75. pmid:33460387
- View Article
- PubMed/NCBI
- Google Scholar
13. Wang Y, Klijn JGM, Zhang Y, Sieuwerts AM, Look MP, Yang F, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365(9460):671–9. pmid:15721472
- View Article
- PubMed/NCBI
- Google Scholar
14. Aguirre-Gamboa R, Trevino V. SurvMicro: assessment of miRNA-based prognostic signatures for cancer clinical outcomes by multivariate survival analysis. Bioinformatics. 2014;30(11):1630–2. pmid:24519378
- View Article
- PubMed/NCBI
- Google Scholar
15. Goeman JJ. L1 penalized estimation in the Cox proportional hazards model. Biom J. 2010;52(1):70–84. pmid:19937997
- View Article
- PubMed/NCBI
- Google Scholar
16. Ching T, Zhu X, Garmire LX. Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput Biol. 2018;14(4):e1006076. pmid:29634719
- View Article
- PubMed/NCBI
- Google Scholar
17. Hao J, Kim Y, Mallavarapu T, Oh JH, Kang M. Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data. BMC Med Genomics. 2019;12(Suppl 10):189. pmid:31865908
- View Article
- PubMed/NCBI
- Google Scholar
18. Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res. 2018;24(6):1248–59. pmid:28982688
- View Article
- PubMed/NCBI
- Google Scholar
19. Tong L, Mitchel J, Chatlin K, Wang MD. Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis. BMC Med Inform Decis Mak. 2020;20(1):225. pmid:32933515
- View Article
- PubMed/NCBI
- Google Scholar
20. Kim S, Kim K, Choe J, Lee I, Kang J. Improved survival analysis by learning shared genomic information from pan-cancer data. Bioinformatics. 2020;36(Suppl_1):i389–98. pmid:32657401
- View Article
- PubMed/NCBI
- Google Scholar
21. Ling Y, Liu Z, Xue J-H. Survival analysis of high-dimensional data with graph convolutional networks and geometric graphs. IEEE Trans Neural Netw Learn Syst. 2024;35(4):4876–86. pmid:35862325
- View Article
- PubMed/NCBI
- Google Scholar
22. Wang Y, Zhang Z, Chai H, Yang Y. Multi-omics cancer prognosis analysis based on graph convolution network. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2021. p. 1564–8. https://doi.org/10.1109/bibm52615.2021.9669797
23. Li R, Wu X, Li A, Wang M. HFBSurv: hierarchical multimodal fusion with factorized bilinear models for cancer survival prediction. Bioinformatics. 2022;38(9):2587–94. pmid:35188177
- View Article
- PubMed/NCBI
- Google Scholar
24. Wang C, Guo J, Zhao N, Liu Y, Liu X, Liu G, et al. A cancer survival prediction method based on graph convolutional network. IEEE Trans Nanobioscience. 2020;19(1):117–26. pmid:31443039
- View Article
- PubMed/NCBI
- Google Scholar
25. Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014;11(3):333–7. pmid:24464287
- View Article
- PubMed/NCBI
- Google Scholar
26. Wen G, Li L. FGCNSurv: dually fused graph convolutional network for multi-omics survival prediction. Bioinformatics. 2023;39(8):btad472. pmid:37522887
- View Article
- PubMed/NCBI
- Google Scholar
27. Vaswani A. Attention is all you need. Advances in Neural Information Processing Systems. 2017.
28. Ying C, Cai T, Luo S, Zheng S, Ke G, He D, et al. Advances in Neural Information Processing Systems. 2021;34:28877–88.
- View Article
- Google Scholar
29. Lian J, Deng J, Hui ES, Koohi-Moghadam M, She Y, Chen C, et al. Early stage NSCLS patients’ prognostic prediction with multi-information using transformer and graph neural network model. Elife. 2022;11:e80547. pmid:36194194
- View Article
- PubMed/NCBI
- Google Scholar
30. Wang Z, Sun J. Survtrace: transformers for survival analysis with competing events. In: Proceedings of the 13th ACM international conference on bioinformatics, computational biology and health informatics; 2022. p. 1–9.
31. Tang L, Diao S, Li C, He M, Ru K, Qin W. Global contextual representation via graph-transformer fusion for hepatocellular carcinoma prognosis in whole-slide images. Comput Med Imaging Graph. 2024;115:102378. pmid:38640621
- View Article
- PubMed/NCBI
- Google Scholar
32. Zhu Y, Qiu P, Ji Y. TCGA-assembler: open-source software for retrieving and processing TCGA data. Nat Methods. 2014;11(6):599–600. pmid:24874569
- View Article
- PubMed/NCBI
- Google Scholar
33. Kalakoti Y, Yadav S, Sundar D. SurvCNN: a discrete time-to-event cancer survival estimation framework using image representations of omics data. Cancers (Basel). 2021;13(13):3106. pmid:34206288
- View Article
- PubMed/NCBI
- Google Scholar
34. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. arXiv preprint 2017.
- View Article
- Google Scholar
35. Chen D, O’Bray L, Borgwardt K. Structure-aware transformer for graph representation learning. In: International conference on machine learning. PMLR; 2022. p. 3469–89.
36. Lee J-E, Moon P-G, Cho Y-E, Kim Y-B, Kim I-S, Park H, et al. Identification of EDIL3 on extracellular vesicles involved in breast cancer cell invasion. J Proteomics. 2016;131:17–28. pmid:26463135
- View Article
- PubMed/NCBI
- Google Scholar
37. Li J-Y, Ou Z-L, Yu S-J, Gu X-L, Yang C, Chen A-X, et al. The chemokine receptor CCR4 promotes tumor growth and lung metastasis in breast cancer. Breast Cancer Res Treat. 2012;131(3):837–48. pmid:21479551
- View Article
- PubMed/NCBI
- Google Scholar
38. Nagelkerke A, Mujcic H, Bussink J, Wouters BG, van Laarhoven HWM, Sweep FCGJ, et al. Hypoxic regulation and prognostic value of LAMP3 expression in breast cancer. Cancer. 2011;117(16):3670–81. pmid:21319150
- View Article
- PubMed/NCBI
- Google Scholar
39. Arruabarrena-Aristorena A, Maag JLV, Kittane S, Cai Y, Karthaus WR, Ladewig E, et al. FOXA1 mutations reveal distinct chromatin profiles and influence therapeutic response in breast cancer. Cancer Cell. 2020;38(4):534-550.e9. pmid:32888433
- View Article
- PubMed/NCBI
- Google Scholar
40. Mathias C, Groeneveld CS, Trefflich S, Zambalde EP, Lima RS, Urban CA, et al. Novel lncRNAs co-expression networks identifies LINC00504 with oncogenic role in luminal a breast cancer cells. Int J Mol Sci. 2021;22(5):2420. pmid:33670895
- View Article
- PubMed/NCBI
- Google Scholar
41. Miao Z, Cao Q, Liao R, Chen X, Li X, Bai L, et al. Elevated transcription and glycosylation of B3GNT5 promotes breast cancer aggressiveness. J Exp Clin Cancer Res. 2022;41(1):169. pmid:35526049
- View Article
- PubMed/NCBI
- Google Scholar
42. Bon E, Driffort V, Gradek F, Martinez-Caceres C, Anchelin M, Pelegrin P, et al. SCN4B acts as a metastasis-suppressor gene preventing hyperactivation of cell migration in breast cancer. Nat Commun. 2016;7:13648. pmid:27917859
- View Article
- PubMed/NCBI
- Google Scholar
43. Gui J, Chen T, Zhang J, Cao Q, Sun Z, Luo H, et al. A survey on self-supervised learning: algorithms, applications, and future trends. IEEE Trans Pattern Anal Mach Intell. 2024;46(12):9052–71. pmid:38885108
- View Article
- PubMed/NCBI
- Google Scholar
44. Cheerla A, Gevaert O. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics. 2019;35(14):i446–54. pmid:31510656
- View Article
- PubMed/NCBI
- Google Scholar
45. Xiao H, Li L, Liu Q, Zhu X, Zhang Q. Transformers in medical image segmentation: a review. Biomedical Signal Processing and Control. 2023;84:104791.
- View Article
- Google Scholar
46. Chen CFR, Fan Q, Panda R. Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision; 2021. p. 357–66.
47. Tsai Y-HH, Bai S, Yamada M, Morency L-P, Salakhutdinov R. Transformer dissection: an unified understanding for transformer’s attention via the lens of Kernel. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. https://doi.org/10.18653/v1/d19-1443
48. Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1972;34(2):187–202.
- View Article
- Google Scholar
49. Darroch J. Discussion of paper by D R Cox. International Statistical Review/Revue Internationale de Statistique. 1984;52(1):26–8.
- View Article
- Google Scholar

[ref1] 1. Dey T, Lipsitz SR, Cooper Z, Trinh Q-D, Krzywinski M, Altman N. Survival analysis-time-to-event data and censoring. Nat Methods. 2022;19(8):906–8. pmid:35927476
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Louis DN, Perry A, Reifenberger G, von Deimling A, Figarella-Branger D, Cavenee WK, et al. The 2016 World Health Organization classification of tumors of the central nervous system: a summary. Acta Neuropathol. 2016;131(6):803–20. pmid:27157931
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Aalen O, Borgan O, Gjessing H. Survival and event history analysis: a process point of view. Springer Science & Business Media; 2008.

[ref4] 4. Lee ET, Wang J. Statistical methods for survival data analysis. John Wiley & Sons; 2003.

[ref5] 5. Ishwaran H, Gerds TA, Kogalur UB, Moore RD, Gange SJ, Lau BM. Random survival forests for competing risks. Biostatistics. 2014;15(4):757–73. pmid:24728979
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref6] 6. Khan FM, Zubek VB. Support vector regression for censored data (SVRc): a novel tool for survival analysis. In: 2008 Eighth IEEE International Conference on Data Mining; 2008. p. 863–8.

[ref7] 7. Zhang C, Cai Y, Lin G, Shen C. Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. p. 12203–13.

[ref8] 8. Kenton JDMWC, Toutanova LK. Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT. vol. 1. Minneapolis, Minnesota; 2019. p. 2.

[ref9] 9. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol. 2018;18(1):24. pmid:29482517
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref10] 10. Norman PA, Li W, Jiang W, Chen BE. deepAFT: A nonlinear accelerated failure time model with artificial neural network. Stat Med. 2024;43(19):3689–701. pmid:38894557
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref11] 11. Lee C, Zame W, Yoon J, Van der Schaar M. DeepHit: a deep learning approach to survival analysis with competing risks. AAAI. 2018;32(1).
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref12] 12. Nagpal C, Li X, Dubrawski A. Deep survival machines: fully parametric survival regression and representation learning for censored data with competing risks. IEEE J Biomed Health Inform. 2021;25(8):3163–75. pmid:33460387
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref13] 13. Wang Y, Klijn JGM, Zhang Y, Sieuwerts AM, Look MP, Yang F, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365(9460):671–9. pmid:15721472
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref14] 14. Aguirre-Gamboa R, Trevino V. SurvMicro: assessment of miRNA-based prognostic signatures for cancer clinical outcomes by multivariate survival analysis. Bioinformatics. 2014;30(11):1630–2. pmid:24519378
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref15] 15. Goeman JJ. L1 penalized estimation in the Cox proportional hazards model. Biom J. 2010;52(1):70–84. pmid:19937997
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref16] 16. Ching T, Zhu X, Garmire LX. Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput Biol. 2018;14(4):e1006076. pmid:29634719
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref17] 17. Hao J, Kim Y, Mallavarapu T, Oh JH, Kang M. Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data. BMC Med Genomics. 2019;12(Suppl 10):189. pmid:31865908
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref18] 18. Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res. 2018;24(6):1248–59. pmid:28982688
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref19] 19. Tong L, Mitchel J, Chatlin K, Wang MD. Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis. BMC Med Inform Decis Mak. 2020;20(1):225. pmid:32933515
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref20] 20. Kim S, Kim K, Choe J, Lee I, Kang J. Improved survival analysis by learning shared genomic information from pan-cancer data. Bioinformatics. 2020;36(Suppl_1):i389–98. pmid:32657401
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref21] 21. Ling Y, Liu Z, Xue J-H. Survival analysis of high-dimensional data with graph convolutional networks and geometric graphs. IEEE Trans Neural Netw Learn Syst. 2024;35(4):4876–86. pmid:35862325
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref22] 22. Wang Y, Zhang Z, Chai H, Yang Y. Multi-omics cancer prognosis analysis based on graph convolution network. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2021. p. 1564–8. https://doi.org/10.1109/bibm52615.2021.9669797

[ref23] 23. Li R, Wu X, Li A, Wang M. HFBSurv: hierarchical multimodal fusion with factorized bilinear models for cancer survival prediction. Bioinformatics. 2022;38(9):2587–94. pmid:35188177
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref24] 24. Wang C, Guo J, Zhao N, Liu Y, Liu X, Liu G, et al. A cancer survival prediction method based on graph convolutional network. IEEE Trans Nanobioscience. 2020;19(1):117–26. pmid:31443039
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref25] 25. Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014;11(3):333–7. pmid:24464287
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref26] 26. Wen G, Li L. FGCNSurv: dually fused graph convolutional network for multi-omics survival prediction. Bioinformatics. 2023;39(8):btad472. pmid:37522887
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref27] 27. Vaswani A. Attention is all you need. Advances in Neural Information Processing Systems. 2017.

[ref28] 28. Ying C, Cai T, Luo S, Zheng S, Ke G, He D, et al. Advances in Neural Information Processing Systems. 2021;34:28877–88.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref29] 29. Lian J, Deng J, Hui ES, Koohi-Moghadam M, She Y, Chen C, et al. Early stage NSCLS patients’ prognostic prediction with multi-information using transformer and graph neural network model. Elife. 2022;11:e80547. pmid:36194194
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref30] 30. Wang Z, Sun J. Survtrace: transformers for survival analysis with competing events. In: Proceedings of the 13th ACM international conference on bioinformatics, computational biology and health informatics; 2022. p. 1–9.

[ref31] 31. Tang L, Diao S, Li C, He M, Ru K, Qin W. Global contextual representation via graph-transformer fusion for hepatocellular carcinoma prognosis in whole-slide images. Comput Med Imaging Graph. 2024;115:102378. pmid:38640621
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref32] 32. Zhu Y, Qiu P, Ji Y. TCGA-assembler: open-source software for retrieving and processing TCGA data. Nat Methods. 2014;11(6):599–600. pmid:24874569
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref33] 33. Kalakoti Y, Yadav S, Sundar D. SurvCNN: a discrete time-to-event cancer survival estimation framework using image representations of omics data. Cancers (Basel). 2021;13(13):3106. pmid:34206288
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref34] 34. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. arXiv preprint 2017.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref35] 35. Chen D, O’Bray L, Borgwardt K. Structure-aware transformer for graph representation learning. In: International conference on machine learning. PMLR; 2022. p. 3469–89.

[ref36] 36. Lee J-E, Moon P-G, Cho Y-E, Kim Y-B, Kim I-S, Park H, et al. Identification of EDIL3 on extracellular vesicles involved in breast cancer cell invasion. J Proteomics. 2016;131:17–28. pmid:26463135
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref37] 37. Li J-Y, Ou Z-L, Yu S-J, Gu X-L, Yang C, Chen A-X, et al. The chemokine receptor CCR4 promotes tumor growth and lung metastasis in breast cancer. Breast Cancer Res Treat. 2012;131(3):837–48. pmid:21479551
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref38] 38. Nagelkerke A, Mujcic H, Bussink J, Wouters BG, van Laarhoven HWM, Sweep FCGJ, et al. Hypoxic regulation and prognostic value of LAMP3 expression in breast cancer. Cancer. 2011;117(16):3670–81. pmid:21319150
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref39] 39. Arruabarrena-Aristorena A, Maag JLV, Kittane S, Cai Y, Karthaus WR, Ladewig E, et al. FOXA1 mutations reveal distinct chromatin profiles and influence therapeutic response in breast cancer. Cancer Cell. 2020;38(4):534-550.e9. pmid:32888433
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref40] 40. Mathias C, Groeneveld CS, Trefflich S, Zambalde EP, Lima RS, Urban CA, et al. Novel lncRNAs co-expression networks identifies LINC00504 with oncogenic role in luminal a breast cancer cells. Int J Mol Sci. 2021;22(5):2420. pmid:33670895
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref41] 41. Miao Z, Cao Q, Liao R, Chen X, Li X, Bai L, et al. Elevated transcription and glycosylation of B3GNT5 promotes breast cancer aggressiveness. J Exp Clin Cancer Res. 2022;41(1):169. pmid:35526049
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

[ref42] 42. Bon E, Driffort V, Gradek F, Martinez-Caceres C, Anchelin M, Pelegrin P, et al. SCN4B acts as a metastasis-suppressor gene preventing hyperactivation of cell migration in breast cancer. Nat Commun. 2016;7:13648. pmid:27917859
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

[ref43] 43. Gui J, Chen T, Zhang J, Cao Q, Sun Z, Luo H, et al. A survey on self-supervised learning: algorithms, applications, and future trends. IEEE Trans Pattern Anal Mach Intell. 2024;46(12):9052–71. pmid:38885108
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref44] 44. Cheerla A, Gevaert O. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics. 2019;35(14):i446–54. pmid:31510656
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

[ref45] 45. Xiao H, Li L, Liu Q, Zhu X, Zhang Q. Transformers in medical image segmentation: a review. Biomedical Signal Processing and Control. 2023;84:104791.
View Article
Google Scholar

[148] View Article

[149] Google Scholar

[ref46] 46. Chen CFR, Fan Q, Panda R. Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision; 2021. p. 357–66.

[ref47] 47. Tsai Y-HH, Bai S, Yamada M, Morency L-P, Salakhutdinov R. Transformer dissection: an unified understanding for transformer’s attention via the lens of Kernel. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. https://doi.org/10.18653/v1/d19-1443

[ref48] 48. Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1972;34(2):187–202.
View Article
Google Scholar

[153] View Article

[154] Google Scholar

[ref49] 49. Darroch J. Discussion of paper by D R Cox. International Statistical Review/Revue Internationale de Statistique. 1984;52(1):26–8.
View Article
Google Scholar

[156] View Article

[157] Google Scholar

Figures

Abstract

Author summary

Introduction

Results

Overview of CoFormerSurv

Datasets and preprocessing

Evaluation metrics

Experimental setting

Performance comparison with existing methods

Ablation study

Feature analysis

Performance of CoFormerSurv method on different multi-omics data types

Discussion

Materials and methods

Inter-omics transformer

Inter-sample graph transformer

Survival analysis with Cox-PH model

Supporting information

S1 Text. Fig A illustrates the variation in C-index values of the CoFormerSurv method with the hyperparameter K (the neighborhood size for graph construction).

References