Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A contrastive adversarial encoder for multi-omics data integration

Abstract

Early and accurate cancer detection is crucial for effective treatment, prognosis, and the advancement of precision medicine. Analyzing omics data is vital in cancer research. While using a single type of omics data provides a limited perspective, integrating multiple omics modalities allows for a more comprehensive understanding of cancer. Current deep models struggle to achieve efficient dimensionality reduction while preserving global information and integrating multi-omics data. This often results in feature redundancy or information loss, overlooking the synergies among different modalities. This paper proposes a contrastive adversarial encoder (CAEncoder) for multi-omics data integration to address this challenge. The proposed model combines a Vision Transformer (ViT) and a CycleGAN, trained in an end-to-end contrastive manner. The ViT is the encoder, utilizing self-attention, while the CycleGAN employs adversarial learning to ensure more discriminative and invariant latent space embeddings. Contrastive adversarial training improves representation quality by preventing information loss, eliminating redundancy, and capturing the synergies among different omics modalities. To ensure contrastive adversarial training, a composite loss function is used, consisting of a weighted combination of Adversarial Loss (Hinge Loss), Cycle Consistency Loss, and Triplet Margin Loss. The Adversarial Loss and Cycle Consistency Loss provide feedback from the CycleGAN, ensuring effective adversarial learning. Meanwhile, the Triplet Margin Loss promotes contrastive learning by pulling similar samples together and pushing dissimilar samples apart in the latent space. The performance of the CAEncoder is evaluated on downstream classification tasks, including both binary and multi-class classifications of five different cancer types. The results show that the model achieved a classification accuracy of up to 93.33% and an F1 score of 92.81%, outperforming existing advanced models. These findings demonstrate the potential of our method to enhance precision medicine for cancer through improved multi-omics data integration.

1 Introduction

Cancer continues to be one of the most critical health challenges globally. In 2022, nearly 20 million new cancer cases were reported worldwide, resulting in 9.7 million deaths related to the disease. It is estimated that approximately one in five people will be diagnosed with cancer at some point in their lives, with one in nine men and one in twelve women expected to die from it [1]. Early cancer detection has traditionally relied on conventional machine learning algorithms and single-omics data [24]. However, single-omics data often falls short of capturing essential information from various biological layers. In contrast, integrating multi-omics data with deep learning has shown significant improvements over single-omics modalities [511].

The current research focuses on integrating various omics modalities to extract combined information for a more effective analysis of this critical disease. Wang et al. [12] utilize a transformer with multi-head self-attention and graph convolutional networks (GCN) to integrate these multi-omics modalities. Their results indicate an accuracy of 83.0% for Alzheimer’s classification and 86.7% for breast cancer classification. Lan et al. [13] proposed an integration model called DeepKEGG, which leverages biological hierarchical modules in the local connections of nodes to improve interpretability. This model also includes a pathway self-attention mechanism to explore correlations between different samples. Additionally, Zheng et al. [14] introduced a method called GCFANet, which processes multimodal omics data through global and cross-modal feature aggregation, feature confidence learning, and a GCN branch. Experimental results demonstrate that this method effectively enhances the classification performance of multi-omics data [15]. Furthermore, Li et al. [16] introduced a novel end-to-end multi-omics Graph Neural Network (GNN) framework for cancer classification, utilizing heterogeneous multilayer graphs to integrate both intra-omics and inter-omics connections. For breast cancer subset classification, Huang et al. [17] proposed a deep-learning framework called DSCCN. This method conducts differential analysis on multi-omics expression data to identify differentially expressed genes and employs sparse canonical correlation analysis to extract highly correlated features among these genes. These features are then trained separately using a multi-task deep learning neural network to predict breast cancer subtypes.

Zhu et al. [18] proposed a supervised deep learning method called the Geometric Graph Neural Network (GGNN). This approach integrates genomic geometric features and protein interaction pathway information into the deep learning model. The Denoised Multi-Omics Integration Framework [19] consists of two key components: a distribution-based feature denoising algorithm (FSD)–aimed at reducing data dimensionality, and a multi-omics integration framework (AttentionMOI)–designed for predicting cancer prognosis and identifying cancer subtypes. The results demonstrated that this model performed significantly well across 15 cancers in the TCGA database. The moBRCA-net framework [20] addresses the challenge of high-dimensional data in breast cancer classification. By integrating multi-omics data, it utilizes a feature selection module and a self-attention module to capture the relative importance of each omics modality. Deep Centroid [21] addresses challenges in omics data classification, including high-dimensional data, limited sample sizes, and source bias. Yan et al. [22] developed a hierarchical multi-level Graph Neural Network (GNN) approach that utilizes multi-omics data, gene regulatory networks, and pathway information to extract discriminative features, thereby improving the accuracy of survival risk predictions. AUTOSurv [23] utilizes a specially designed Variational Autoencoder (VAE) for the dimensionality reduction of multi-omics data. This model has demonstrated significant performance in prognosis prediction across multiple independent datasets when compared to alternative strategies and machine learning methods. Guo et al. [24] utilize network embedding technology to integrate gene co-expression data, somatic mutation data, and clinical information. By combining the struc2vec model with the random survival forest (RSF) model, they successfully predicted both long-term and short-term survival outcomes for patients with lung adenocarcinoma (LUAD).

Multi-omics data integration models have demonstrated significant improvements in cancer analysis compared to single-omics models. However, these multi-omics models still face challenges in effectively capturing synergistic features from different modalities. This limitation undermines the full potential of data integration in cancer research. Additionally, multi-omics models often prioritize stronger modalities at the expense of weaker ones, which diminishes the benefits of joint learning and negatively impacts performance in downstream tasks. Furthermore, the imbalanced nature of the data affects the overall efficiency of these models.

This paper presents a novel multi-omics integration model for cancer classification. The framework includes two main components: 1) an encoder, which utilizes a transformer to map multi-omics data to a reduced latent space, and 2) a CycleGAN that provides feedback to the encoder, enabling it to learn discriminative features and enhance generalization. The model is trained in a supervised contrastive manner, which helps bring similar modalities closer together while distancing dissimilar ones in the latent space. By employing contrastive learning, the model manages to relatively effectively mitigate the data imbalance, ensuring that all modalities are taken into account and thereby learning the synergies across them. Finally, the classification is performed in the latent space. The results indicate a significant improvement compared to current state-of-the-art methods.

2 Proposed model

The proposed model (see Fig 1) comprises two main modules: the encoder and the CycleGAN [25]. The encoder is the vision transformer (ViT) model [26], denoted as , which maps the high-dimensional multi-omics modalities into a reduced latent space , where nx is significantly greater than nz. At the same time, the CycleGAN enhances the performance of the transformer encoder by providing feedback to integrate multi-omics information, extract discriminative features, and reduce dimensionality.

thumbnail
Fig 1. The block diagram of the proposed Model.

Here, X and represent the original and synthesized multi-omics data, while Z and denote the latent reduced space. The acronyms SN, MLP, ViT, and FFN stand for Switch Normalization, Multi-Layer Perceptron, Vision Transformer, and Feedforward Network, respectively. The symbols D1 and D2 refer to Discriminator 1 and Discriminator 2, while G1 and G2 represent Generator 1 and Generator 2. The symbol indicates component-wise addition.

https://doi.org/10.1371/journal.pone.0333134.g001

Both modules are trained in an end-to-end manner, where the CycleGAN provides gradient-based feedback to the ViT encoder via adversarial and cycle-consistency losses. This joint optimization enables the encoder to learn not only from contrastive objectives but also from the reconstruction feedback, which enhances its robustness and generalization. During inference, only the ViT encoder is used to extract low-dimensional representations for downstream classification tasks.

2.1 ViT encoder

The is composed of 8 blocks, each consisting of multi-head attention (MHA), Switch Normalization (SN), and a Multi-layer Perceptron (MLP). The multi-head self-attention projects multi-omics data into subspaces calculates attention weights based on the significance of each position, and aggregates the outputs to produce the final attention output. The attention weight for a position j relative to all positions k in the ith head is computed as:

(1)

where , , Xi is the multi-omics modality, is the query weight matrix, and is the key weight matrix. So, the output of the ith attention head is given by:

(2)

where is the value matrix. Finally, the outputs from all attention heads are concatenated, , to create the final multi-head attention output.

The normalized output, denoted as , from the final multi-head attention layer is passed through a feedforward network (FFN) that consists of three layers. This network progressively transforms the output into a reduced latent vector . The first two layers of the FFN utilize the Leaky ReLU activation function, while the final layer is linear. Typically, transformers use a class token, denoted as CLS, for classification purposes; however, in this case, the CLS token is not employed because the goal is to map the high-dimensional data into a reduced latent space.

Here, the Switch Normalization (SN) [27] is used to improve both the training stability and expressive capability of the encoder. Unlike fixed normalization techniques, SN dynamically selects the most suitable normalization strategy based on the input characteristics and training conditions. The trainable coefficients are used to combine Batch Normalization (BN) and Layer Normalization (LN) to ensure Switch Normalization (SN). Given an input feature x, the SN output is computed as:

(3)

where and are trainable parameters that adaptively balance the contributions of BN and LN during training. This dynamic adjustment allows SN to optimize the model’s adaptability to varying data distributions, enhancing training stability and generalization performance. The experiments demonstrate that SN effectively stabilizes training by mitigating the sensitivity to batch size variations and distribution shifts.

Moreover, although the primary objective of the ViT encoder is driven by contrastive loss, its parameters are also influenced by the adversarial and cycle-consistency losses propagated from the CycleGAN. Since both generators G1 and G2 operate on the latent representation , the feedback from their respective losses flows back to the encoder. This integrated training mechanism allows the ViT encoder to refine its feature extraction by leveraging both discriminative and generative signals.

Overall, the encoder E is composed of 8 Transformer blocks, each with 8 attention heads. To project the high-dimensional input into a representation suitable for Transformer encoding, a linear projection layer is first applied to map the input to a 1024-dimensional embedding space. Following the attention mechanism, a multi-layer perceptron (MLP) consisting of three fully connected layers is used, with output dimensions of 1024 512 256. The first two layers employ the Leaky ReLU activation function, and the final layer is linear.

2.2 CycleGAN architecture

The proposed CycleGAN architecture consists of two generators, G1 and G2, and two discriminators, D1 and D2. This framework enables bidirectional translation between high-dimensional multi-omics data and their corresponding low-dimensional latent representations.

  • Generator takes a latent representation z, obtained from the encoder E, and generates a synthetic multi-omics modality that approximates the original high-dimensional data x.
  • Generator takes the original multi-omics data x as input and reconstructs a latent representation that should resemble the true latent vector .
  • Discriminator attempts to distinguish real high-dimensional data x from the generated data .
  • Discriminator aims to differentiate between the true latent vector z and the reconstructed vector .

The objectives are as follows:

  • Generator G1 is trained to ensure that the generated data is indistinguishable from the real multi-omics data x.
  • Generator G2 is trained to ensure that the reconstructed latent representation closely approximates the true latent vector .
  • Discriminator D1 is trained to assign a high score to real samples and a low score to generated ones:(4)
  • Discriminator D2 is trained similarly to distinguish real and generated latent vectors:(5)

As shown in Eqs 4 and 5, the discriminators aim to maximize the prediction scores for real samples while minimizing them for generated ones, thereby guiding the generators to produce realistic outputs.

Originally, CycleGAN was developed for image generation using 2D convolution. However, in our case, the input multi-omics data is a long vector. Therefore, we have adapted CycleGAN to utilize linear convolution.

1D convolutions operate along the feature vector, enabling the model to capture dependencies across different omics features. The 1D convolution operation can be expressed as:

(6)

where is the input feature vector, is the convolution kernel, b is the bias term, and k is the kernel size. Furthermore, the generators G1 and G2 are designed to map input feature vectors into output vectors using a combination of linear layers and 1D convolutions. The discriminators D1 and D2 also operate on feature vectors, ensuring that adversarial training remains robust and effective for structured data. These modifications allow CycleGAN to model bidirectional mappings between different multi-omics modalities while preserving the advantages of adversarial training.

Since all generated and reconstructed data in CycleGAN rely on the latent vectors z produced by the encoder E, the encoder is directly updated through the gradients of adversarial and cycle-consistency losses. This design effectively couples CycleGAN with the encoder, enhancing the encoder’s feature learning capability beyond what contrastive learning alone can provide.

The entire model, integrating the ViT encoder with CycleGAN, is optimized using a supervised contrastive learning approach. This contrastive mechanism enables us to bring similar points closer together in the latent space while pushing dissimilar points further apart. Additionally, it facilitates the understanding of synergies among different modalities, ultimately enhancing the performance of downstream classification tasks.

After transforming the high-dimensional multi-omics data into the corresponding latent space , the classification task is then conducted within the latent space .

2.3 Loss functions

The model is trained end-to-end by integrating three types of losses: contrastive loss, adversarial loss, and cycle consistency loss. The objective is to train the encoder using a contrastive adversarial approach, effectively mapping high-dimensional data into a compact latent space. This process ultimately enhances the performance of downstream classification tasks.

Contrastive Loss: The contrastive loss [28] optimizes the encoder E to reduce the distance between similar samples while increasing the distance between dissimilar samples within the latent space z.

(7)

where represents the encoder, represents anchor samples, represents positive samples, represents negative samples, N is the batch size, and m is the margin used in the contrastive loss.

Adversarial loss: The Hinge Loss [27] is used as an adversarial loss because it offers better stability and faster convergence compared to traditional cross-entropy loss. This approach is particularly effective for handling high-dimensional data, as it more effectively manages the adversarial dynamics between the generator and discriminator.

The generator G1 produces synthetic high-dimensional multi-omics data from low-dimensional latent vectors z and . Meanwhile, the discriminator D1 tries to differentiate between real high-dimensional multi-omics data x and the synthetic multi-omics data . Consequently, the adversarial loss can be defined as follows:

(8)

In a similar manner, the generator G2 transforms high-dimensional multi-omics data x into a low-dimensional latent vector , while the discriminator D2 aims to differentiate between z and .

(9)

The objective of the discriminator D1 is to minimize the output value for generated data, pushing it closer to –1. This process enables D1 to effectively distinguish between real high-dimensional multi-omics data x and the generated data G(z).

(10)

The discriminator D2 aims to minimize the output value of the generated data, aiming to make it as close to –1 as possible. This process effectively differentiates between and .

(11)

where, , , and represent the anchor, positive, and negative samples, respectively, for high-dimensional multi-omics data. Similarly, , , and denote the latent low-dimensional vectors corresponding to these samples. The symbols pX and pZ refer to the real distributions of high-dimensional and low-dimensional data, respectively. The notation and indicates the expectations over high-dimensional and low-dimensional data, respectively. Lastly, represents the standard form used in Hinge Loss, ensuring that the generator’s output is as close to 1 as possible, while the discriminator’s output is as close to -1 as possible.

Cycle Consistency Loss: The cycle consistency loss is employed to ensure that the generator’s output can be accurately mapped back to the original input, thereby maintaining data consistency. This is especially crucial when working with high-dimensional multi-omics data, as it helps preserve the complex structure and biological significance of the data, preventing the generated high-dimensional output from losing its original characteristics. Furthermore, cycle consistency loss indirectly enhances the feature extraction capability and training stability of the Transformer model by minimizing feature loss and ensuring data coherence. The cycle consistency loss for the generator G1 is defined as follows:

(12)

The cycle consistency loss for generator G2 is given by:

(13)

where the L1 norm, denoted as , is used to compute the absolute error between the generated data and the original data.

Total Loss: The total loss is the weighted combination of the contrastive loss, , the cycle loss, , and ; as well as the adversarial losses and .

(14)

where α and β are weights that are used to balance the various losses. Therefore, the weights of the encoder are adjusted based on the total loss, , where W represents the trainable parameters of the encoder E. Contrastive learning improves the distinction between positive and negative samples, while the adversarial loss and cycle consistency loss of GAN provide feedback to the encoder E, enhancing feature extraction and resulting in better generalization.

In this integrated setup, the ViT encoder benefits not only from contrastive discrimination but also from reconstruction-based supervision, as the gradients from both the generators and discriminators in CycleGAN are backpropagated through the encoder. This unified feedback loop improves both representation quality and model robustness.

3 Experiments

The section discusses the dataset, the training and hyper-parameters setting of the model, and the quantitative results in detail.

3.1 Datasets

To illustrate the effectiveness of the CAEncoder, we utilized three cancer datasets from TCGA [29] and ROSMAP. Dataset-1 [30] is sourced from the TCGA repository and is referred to as 4-BRCA. This dataset includes multi-omics data such as Copy Number Variation (CNV), mRNA, and Reverse Phase Protein Array (RPPA) data. It encompasses four subtypes of breast cancer: Basal-like, Her2-enriched, Luminal A, and Luminal B, with a total of 511 samples. Dataset-2 [13] combines Alzheimer’s binary classification data from ROSMAP and BRCA five-class data from TCGA. It includes various modalities such as mRNA, DNA methylation, and miRNA. This dataset contains 169 samples from Alzheimer’s disease (AD) patients and 182 samples from normal controls (NC), while the five-class BRCA dataset comprises 875 samples. Dataset-3 [31] consists of data from TCGA, covering four cancer types: Prostate Adenocarcinoma (PRAD) with 250 samples, Breast Invasive Carcinoma (BRCA) with 211 samples, Bladder Urothelial Carcinoma (BLCA) with 402 samples, and Liver Hepatocellular Carcinoma (LIHC) with 354 samples. It features three modalities: mRNA, Single Nucleotide Variants (SNV), and miRNA.

Data Preprocessing: To ensure the quality and consistency of multi-omics data, categorical variables in each omics type (e.g., copy number variation (CNV), mRNA, and reverse phase protein array (RPPA)) are converted into numerical variables. All features are normalized to have a mean of 0 and a standard deviation of 1, which helps maintain consistency across different omics datasets. Furthermore, we incorporate all features from each omics type into the model, allowing it to fully capture the complex biological relationships present in the multi-omics data. To address the inherent imbalance in multi-omics data across different cancer types, we constructed balanced sets of positive and negative sample pairs for each cancer type, with the proportions reflecting their respective sample sizes. The encoder is trained in a contrastive manner, aiming to group similar points closer together in the latent space while pushing dissimilar points farther apart. Contrastive learning requires the samples to be divided into three categories: anchor, positive, and negative.

Anchor, positive and negative samples generation: We consider a multi-omics dataset comprising N samples, each containing data from M distinct modalities. For a given modality m (where ), the samples can be described as , where is a feature vector for the i-th sample and yi indicates its cancer subtype. An anchor sample is formed by combining feature vectors from m selected modalities of the same patient and subtype: . For instance, for the i-th patient classified as Luminal-A breast cancer, . A positive sample xp is created by selecting m modalities from different patients who share the same subtype as the anchor. It is expressed as , where . For example, if patients i, j, and k all have the Luminal-A subtype, the positive sample could be . Conversely, a negative sample xn is generated by selecting at least one modality from a patient with a different class label. It is similarly represented, but at least one of the labels yk must differ from that of the anchor. For instance, if , then a possible negative sample could be . Using for sample selection introduces variability and adds complexity to the training, often resulting in positive samples that are more dissimilar to the anchor despite sharing the same label. This approach enhances the model’s ability to discern subtle differences between cancer subtypes and improves robustness by exposing it to challenging examples, ultimately assisting in better generalization and reducing the risk of overfitting.

3.2 Training and hyper-parameters setting

To ensure reliable and statistically significant results, we adopted a 5-fold cross-validation protocol across all datasets. Each dataset was randomly split into five folds, with 80% used for training and 20% for testing in each iteration. This process was repeated five times using different random seeds to ensure generalizable performance. The encoder was trained using the Adam optimizer with a learning rate of . We applied gradient clipping (1.0) to prevent gradient explosion and incorporated L2 regularization (weight decay) to mitigate overfitting. The model was trained for 100 epochs with a batch size of 64.

Experiments were conducted on a system with an Intel Core i7-10700K CPU at 3.80GHz, paired with an NVIDIA GeForce RTX 3080 GPU. It has 32GB of DDR4 RAM and a 1TB SSD for fast data processing. The operating system is Ubuntu 20.04 LTS, and TensorFlow 2.5.0 was used for deep learning, with all scripts executed in Python 3 for compatibility with the latest libraries.

Our proposed model consists of a Transformer encoder and a CycleGAN framework. The computational complexity of the Transformer encoder follows Vaswani et al. [32] and can be expressed as:

(15)

where L is the number of layers, nt is the number of samples, and dt is the feature dimension.

The computational complexity of the CycleGAN framework can be analyzed based on the complexity of convolutional networks [25,3335]. For a single convolutional layer, the complexity is given by:

(16)

where nc is the number of samples, is the convolutional kernel size, and dc is the feature dimension.

Overall, the combined computational complexity remains manageable for our dataset, allowing for efficient model training within a reasonable time frame.

3.3 Results

Classifier selection: The CAEncoder maps high-dimensional multi-omics data into a reduced latent space, making it challenging to evaluate the effectiveness of the proposed encoder directly. Therefore, we assess its performance based on downstream task classification. To accomplish this, we experimented with various classifiers, including Random Forest (RF), k-nearest Neighbors (K-NN), Decision Tree (DT), and Gradient Boosting Classifier (GBC), which were trained in the reduced latent space.

Table 1 presents the classification results of various classifiers operating in the reduced latent space. The experiments indicate that the Random Forest (RF) classifier outperforms other classifiers in effectively utilizing this latent representation, achieving the highest performance. This effectiveness arises from its ensemble learning strategy, which combines predictions from multiple decision trees. This approach reduces model variance, enhances robustness to noise, and minimizes the risk of overfitting. Furthermore, RF trains multiple sub-models on different subsets of features, thereby leveraging complementary information from various modalities to boost classification performance. Consequently, we have chosen RF as the final classifier for our model in the subsequent experiments.

thumbnail
Table 1. The performance of various classifiers on Dataset-1.

Initially, the dataset is transformed into a reduced latent space z using the proposed encoder E, after which classification is performed in this reduced space.

https://doi.org/10.1371/journal.pone.0333134.t001

The performance of CAEncoder across various modalities: Fig 2 displays the performance of the CAEncoder model on Dataset-1, showcasing various combinations of modalities: one-modality, two-modalities, and three-modalities. The CAEncoder model first transforms the high-dimensional data into a latent space, after which classification is conducted using a Random Forest (RF) algorithm with 15 estimators (). The figure illustrates that combining more modalities results in improved performance. This demonstrates that the proposed encoder effectively learns the synergies among different modalities, enhancing generalization.

thumbnail
Fig 2. Comparison of classification performance on various combinations of the multi-omics modalities.

https://doi.org/10.1371/journal.pone.0333134.g002

Results on Dataset-1: Table 2 presents a comparison of the classification performance between our proposed CAEncoder model and traditional machine learning classifiers, including Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Convolutional Neural Network (CNN), using Dataset-1. In this study, we combined three modalities—CNV, RPPA, and mRNA—for classification purposes. To reduce dimensionality and extract independent features, we applied Principal Component Analysis (PCA) before training the aforementioned classifiers in the reduced feature space. We determine the number of retained principal components by examining the cumulative explained variance ratio, as outlined in [36]. We select components that account for at least 95% of the total variance. This threshold is chosen to balance dimensionality reduction with information retention. Our goal is to reduce the dimensionality of the data for improved computational efficiency while preserving the essential information needed to maintain model performance. The results indicate that the proposed CAEncoder model outperforms the traditional machine learning approaches.

Additionally, Table 2 compares the CAEncoder model with the Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis (MoGCN) [30]. The MoGCN is a deep learning model that also integrates the modalities CNV, RPPA, and mRNA from Dataset-1 for classification. The results for MoGCN [30] are not recreated here but have been cited from the original paper. The findings demonstrate that the CAEncoder model outperforms MoGCN in terms of ACC and F1 scores by 3.51% and 2.65%, respectively.

Results on Dataset-2: In this section, we compare the classification performance of the proposed CAEncoder model with several state-of-the-art deep learning approaches, including MOGONET [37], MODILM [38], HyperTMO [5], and MOCAT [31]. MOGONET [37] integrates multi-omics data using graph convolutional networks, enabling patient classification and biomarker identification. MODILM [38] enhances classification accuracy for complex diseases by synthesizing significant and complementary information from various single-omics datasets. HyperTMO [5] is a multi-omics integration framework specifically designed for patient classification. It utilizes a hypergraph convolutional network to construct hypergraph structures that represent associations between samples in single-omics data. Evidence extraction is performed via the hypergraph convolutional network, allowing for the integration of multi-omics information at an evidence level. The Multi-Omics Integration Framework with Auxiliary Classifiers-enhanced Autoencoders (MOCAT) [31] effectively leverages intra- and inter-omics information. It employs attention mechanisms combined with confidence learning to improve feature representation and ensure trustworthy predictions. The results from these approaches are cited directly from their respective papers and have not been regenerated. Table 3 presents the results of the proposed method alongside the state-of-the-art methods on Dataset-2. The results indicate that the proposed CAEncoder outperforms all the SOTA methods in terms of accuracy and F1 scores.

Results on Dataset-3: Table 4 compares the performance of CAEncoder and DeepKEGG [13] across all four cancer types included in Dataset-3. DeepKEGG is an interpretable multi-omics data integration method designed to predict cancer recurrence and identify biomarkers. It features a biological hierarchical module that establishes local connections between neuron nodes, enhancing the model’s interpretability by illustrating the relationships among genes, miRNAs, and pathways. Additionally, it includes a pathway self-attention module, which analyzes the correlations between different samples and generates potential pathway feature representations that improve the model’s prediction performance. The results indicate that CAEncoder outperforms DeepKEGG in all four areas.

3.4 Ablation study

The proposed model, CAEncoder, primarily consists of an encoder (a transformer) and CycleGAN, and it is trained using a contrastive approach. To assess the effectiveness of each component, we conducted ablation experiments (refer to Table 5).

In the first experiment, we kept both the encoder and the CycleGAN intact but bypassed the contrastive loss, resulting in a model variant we named CAEncoder_notCL. This experiment aimed to highlight the significance of contrastive learning. The results indicated that omitting contrastive learning significantly affected the model’s performance.

In the second experiment, we removed the CycleGAN while retaining the other components, creating a model referred to as CAEncoder_RC. The results from this experiment demonstrated that each component of the proposed model plays a crucial role in enhancing classification performance.

These ablation experiments provide compelling evidence of how the CAEncoder learns more discriminative and generalizable representations. Specifically, the substantial performance drop observed in CAEncoder_notCL confirms the pivotal role of contrastive learning in enhancing feature separability within the latent space. Similarly, the reduced accuracy and F1-score in CAEncoder_RC highlight the contribution of CycleGAN in preserving modality-specific details and preventing the loss of global structural information. Together, these components synergistically improve the quality of learned representations, leading to better generalization on downstream classification tasks.

4 Conclusion

This study introduces a novel multi-omics integration model called CAEncoder for cancer classification. This framework effectively captures the synergies among various multi-omics data and comprehensively processes complex information. Through the feedback mechanisms of CycleGAN, CAEncoder learns to identify different distributions, resulting in improved generalization. The use of contrastive learning encourages the model to understand the relationships among different modalities, thereby enhancing data integration. The encoder maps high-dimensional data into a reduced latent space, where classification is subsequently performed. We evaluated the performance of the proposed model using various datasets, and the results demonstrate that it outperforms state-of-the-art methods. In the future, we plan to extend multi-omics data integration for biomarker detection and survival prediction using self-supervised learning. Furthermore, we recognize the importance of interpretability in deep learning models and plan to explore self-attention weight analysis and feature attribution methods to elucidate the contributions of different omics features to classification decisions, thereby enhancing the model’s transparency and interpretability.

Acknowledgments

Financial Disclosure: The authors extend their appreciation to Umm Al-Qura University, Saudi Arabia, for funding this research work through grant number: 25UQU4310136GSSR04. This research work was funded by Umm Al-Qura University, Saudi Arabia, under grant number: 25UQU4310136GSSR04.

References

  1. 1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022 : GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–63. pmid:38572751
  2. 2. Abbasi EY, Deng Z, Ali Q, Khan A, Shaikh A, Reshan MSA, et al. A machine learning and deep learning-based integrated multi-omics technique for leukemia prediction. Heliyon. 2024;10(3):e25369. pmid:38352790
  3. 3. Gokhale M, Mohanty SK, Ojha A. GeneViT: gene vision transformer with improved DeepInsight for cancer classification. Comput Biol Med. 2023;155:106643. pmid:36803792
  4. 4. Zhang T-H, Hasib MM, Chiu Y-C, Han Z-F, Jin Y-F, Flores M, et al. Transformer for Gene Expression Modeling (T-GEM): an interpretable deep learning model for gene expression-based phenotype predictions. Cancers (Basel). 2022;14(19):4763. pmid:36230685
  5. 5. Wang H, Lin K, Zhang Q, Shi J, Song X, Wu J, et al. HyperTMO: a trusted multi-omics integration framework based on hypergraph convolutional network for patient classification. Bioinformatics. 2024;40(4):btae159. pmid:38530977
  6. 6. Peelen M, Bagheriye L, Kwisthout J. Cancer subtype identification through integrating inter and intra dataset relationships in multi-omics data. IEEE Access. 2024;12:27768–83.
  7. 7. Ren Y, Gao Y, Du W, Qiao W, Li W, Yang Q, et al. Classifying breast cancer using multi-view graph neural network based on multi-omics data. Front Genet. 2024;15:1363896. pmid:38444760
  8. 8. Qattous H, Azzeh M, Ibrahim R, Abed Al-Ghafer I, Al Sorkhy M, Alkhateeb A. PaCMAP-embedded convolutional neural network for multi-omics data integration. Heliyon. 2023;10(1):e23195. pmid:38163104
  9. 9. Raj A, Petreaca RC, Mirzaei G. Multi-omics integration for liver cancer using regression analysis. Curr Issues Mol Biol. 2024;46(4):3551–62. pmid:38666952
  10. 10. Duan X, Ding X, Zhao Z. Multi-omics integration with weighted affinity and self-diffusion applied for cancer subtypes identification. J Transl Med. 2024;22(1):79. pmid:38243340
  11. 11. Braytee A, He S, Tang S, Sun Y, Jiang X, Yu X, et al. Identification of cancer risk groups through multi-omics integration using autoencoder and tensor analysis. Sci Rep. 2024;14(1):11263. pmid:38760420
  12. 12. Wang J, Liao N, Du X, Chen Q, Wei B. A semi-supervised approach for the integration of multi-omics data based on transformer multi-head self-attention mechanism and graph convolutional networks. BMC Genomics. 2024;25(1):86. pmid:38254021
  13. 13. Lan W, Liao H, Chen Q, Zhu L, Pan Y, Chen Y-PP. DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery. Brief Bioinform. 2024;25(3):bbae185. pmid:38678587
  14. 14. Zheng X, Wang M, Huang K, Zhu E. Global and cross-modal feature aggregation for multi-omics data classification and application on drug response prediction. Information Fusion. 2024;102:102077.
  15. 15. Ouyang D, Liang Y, Li L, Ai N, Lu S, Yu M, et al. Integration of multi-omics data using adaptive graph learning and attention mechanism for patient classification and biomarker identification. Comput Biol Med. 2023;164:107303. pmid:37586201
  16. 16. Li B, Nabavi S. A multimodal graph neural network framework for cancer molecular subtype classification. BMC Bioinformatics. 2024;25(1):27. pmid:38225583
  17. 17. Huang Y, Zeng P, Zhong C. Classifying breast cancer subtypes on multi-omics data via sparse canonical correlation analysis and deep learning. BMC Bioinformatics. 2024;25(1):132. pmid:38539064
  18. 18. Zhu J, Oh JH, Simhal AK, Elkin R, Norton L, Deasy JO, et al. Geometric graph neural networks on multi-omics data to predict cancer survival outcomes. Comput Biol Med. 2023;163:107117. pmid:37329617
  19. 19. Pang J, Liang B, Ding R, Yan Q, Chen R, Xu J. A denoised multi-omics integration framework for cancer subtype classification and survival prediction. Brief Bioinform. 2023;24(5):bbad304. pmid:37594302
  20. 20. Choi JM, Chae H. moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks. BMC Bioinformatics. 2023;24(1):169. pmid:37101124
  21. 21. Xie K, Hou Y, Zhou X. Deep centroid: a general deep cascade classifier for biomedical omics data classification. Bioinformatics. 2024;40(2):btae039. pmid:38305432
  22. 22. Yan H, Weng D, Li D, Gu Y, Ma W, Liu Q. Prior knowledge-guided multilevel graph neural network for tumor risk prediction and interpretation via multi-omics data integration. Brief Bioinform. 2024;25(3):bbae184. pmid:38670157
  23. 23. Jiang L, Xu C, Bai Y, Liu A, Gong Y, Wang Y-P, et al. Autosurv: interpretable deep learning framework for cancer survival analysis incorporating clinical and multi-omics data. NPJ Precis Oncol. 2024;8(1):4. pmid:38182734
  24. 24. Guo D, Wang Y, Chen J, Liu X. Integration of multi-omics data for survival prediction of lung adenocarcinoma. Comput Methods Programs Biomed. 2024;250:108192. pmid:38701699
  25. 25. Zhu J-Y, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 2223–32.
  26. 26. Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint 2020. https://arxiv.org/abs/2010.11929
  27. 27. Heng Y, Yinghua M, Khan FG, Khan A, Hui Z. HLSNC-GAN: medical image synthesis using hinge loss and switchable normalization in CycleGAN. IEEE Access. 2024;12:55448–64.
  28. 28. Schroff F, Kalenichenko D, Philbin J. FaceNet: a unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. p. 815–23. https://doi.org/10.1109/cvpr.2015.7298682
  29. 29. Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45(10):1113–20. pmid:24071849
  30. 30. Li X, Ma J, Leng L, Han M, Li M, He F, et al. MoGCN: a multi-omics integration method based on graph convolutional network for cancer subtype analysis. Front Genet. 2022;13:806842. pmid:35186034
  31. 31. Yao X, Jiang X, Luo H, Liang H, Ye X, Wei Y, et al. MOCAT: multi-omics integration with auxiliary classifiers enhanced autoencoder. BioData Min. 2024;17(1):9. pmid:38444019
  32. 32. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in Neural Information Processing Systems. 2017;30.
  33. 33. Canziani A, Paszke A, Culurciello E. An analysis of deep neural network models for practical applications . arXiv preprint 2016. https://arxiv.org/abs/1605.07678
  34. 34. Lucic M, Kurach K, Michalski M, Gelly S, Bousquet O. Are GANs created equal? A large-scale study. arXiv preprint 2017. https://arxiv.org/abs/1711.10337
  35. 35. Thompson NC, Greenewald K, Lee K, Manso GF. The computational limits of deep learning. arXiv preprint. 2020. https://arxiv.org/abs/2007.05558
  36. 36. Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci. 2016;374(2065):20150202. pmid:26953178
  37. 37. Wang T, Shao W, Huang Z, Tang H, Zhang J, Ding Z, et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun. 2021;12(1):3445. pmid:34103512
  38. 38. Zhong Y, Peng Y, Lin Y, Chen D, Zhang H, Zheng W, et al. MODILM: towards better complex diseases classification using a novel multi-omics data integration learning model. BMC Med Inform Decis Mak. 2023;23(1):82. pmid:37147619