Figures
Abstract
Circular RNA, a class of RNA molecules gaining widespread attentions, has been widely recognized as a potential biomarker for many diseases. In recent years, significant progress has been made in the study of the associations between circRNA and diseases. However, traditional experimental methods are often inefficient and costly, making computational models an effective alternative. Nevertheless, existing computational methods still face challenges such as data sparsity and the difficulty of confirming negative samples, which limits the accuracy of predictions. To address these challenges, a novel computational method, namely MVHGCN, is proposed based on multi-view and graph convolutional networks to predict potential associations between circRNA and diseases. MVHGCN first constructs a heterogeneous graph and generates feature descriptors by integrating multiple databases. Then it extracts different connection views of circRNA and diseases through meta-paths, maximizing the utilization of known association information, and aggregates deep feature information through graph convolutional networks. Finally, a MLP is used to predict the association scores. The experimental results show that MVHGCN significantly outperforms existing methods on benchmark datasets by 5-fold cross-validation. This research provides an effective new approach to studying the associations between circRNAs and diseases, capable of alleviating the problem of data sparsity and accurately identifying potential associations.
Author summary
Circular RNA has garnered significant attention due to its unique structure and potential role in regulating gene expression. It can interact with microRNAs, preventing the degradation of messenger RNAs, and influencing a network of competing RNAs. However, traditional experimental methods are often inefficient and costly, making computational models an effective alternative. Nevertheless, existing computational methods still face challenges such as data sparsity and the difficulty of confirming negative samples, which limits the accuracy of predictions. To address these issues, I propose a novel method called MVHGCN, which leverages heterogeneous graphs and graph convolutional networks to predict associations between circRNAs and diseases. This approach integrates diverse data views and uses deep learning to provide accurate predictions. When compared to existing methods, MVHGCN significantly outperforms them, demonstrating its potential to advance disease research and the development of therapeutic targets. The results underscore the importance of improving prediction models in the study of circRNA-disease relationships, ultimately contributing to more effective disease diagnosis and treatment strategies.
Citation: Miao Y, Tang X, Wang C, Sun Z, Wang G, Huang S (2025) MVHGCN: Predicting circRNA-disease associations with multi-view heterogeneous graph convolutional neural networks. PLoS Comput Biol 21(6): e1013225. https://doi.org/10.1371/journal.pcbi.1013225
Editor: Quan Zou, University of Electronic Science and Technology, CHINA
Received: April 16, 2025; Accepted: June 10, 2025; Published: June 19, 2025
Copyright: © 2025 Miao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The source code and datasets are available on https://github.com/Maoqwq/MVHGCN.git.
Funding: This work is supported in part by funds from the National Key Research and Development Program of China (Grant no. 2024YFC3405900 and 2024YFC3405902); the National Natural Science Foundation of China (NSF: # 62301139); and the Fundamental Research Funds for the Central Universities. All the funders played a role in the study design, data collection and analysis, decision to publish, and preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Circular RNA (circRNA) is a type of RNA molecule that exists widely in various biological categories [1], which has received widespread attention in recent years. Existing studies have demonstrated that once the circRNA is transported to the cytoplasm, it can act as a microRNA (miRNA) sponge [2]. By competitive binding to miRNAs [3], circRNA inhibits miRNA-mediated degradation of messenger RNA (mRNA) targets [4], thus regulating the function of the competing endogenous RNA network (ceRNA) [5]. Compared to linear non-coding RNAs (lncRNA), circRNA is characterized by a covalently closed circular structure. This specific structure endows circRNA with greater stability in vivo compared to lncRNA, conferring enhanced resistance to nucleases and exoribonucleases [6], potentially making circRNA a valuable biomarker for diseases [7]. Consequently, the discovery of new associations between circRNAs and diseases can provide crucial insights for disease researches and the development of drug targets [8].
Several tools have been proposed to classify circRNA-disease associations [9], which can be broadly categorized into three types: network-based methods, machine learning-based methods, and deep learning-based methods. Network-based methods predict associations by constructing a circRNA-disease association network before the progress of random walk and message propagation. For instance, KATZHCDA [10] employed a heterogeneous network constructed by circular RNA expression profiles, disease phenotype similarity, and Gaussian interaction profile kernel similarity, with KATZ algorithm to predict associations. RWRKNN [11] integrated the restart random walk algorithm with the k-Nearest Neighbor (KNN) algorithm to predict the associations between circRNAs and diseases. Machine learning-based methods construct circRNA and disease features to enrich association information and then use machine learning algorithms to predict these associations. iCircDA-MF [12] performed non-negative matrix factorization on the association matrix to compute association scores. NMFCDA [13] predicted associations between circRNAs and diseases by integrating randomized neural network pseudoinverse learning with non-negative matrix factorization. Deep learning-based methods employs deep learning techniques to extract latent features for predicting associations [14]. Lan et al. [15] employed a graph attention network (GAT), which aggregates information through multiple layers of propagation, followed by a Multi-Layer Perceptron (MLP) for prediction. GMNN2CD [16] integrated graph autoencoder and variational inference within a graph Markov neural network framework, leveraging feature inference (GNNq) and label propagation (GNNp) trained alternately via a variational EM algorithm to predict circRNA–disease associations. Wu et al. [17] proposed a method based on Transformer for knowledge representation learning and attention propagation layers to obtain high-quality embeddings, followed by a MLP for predicting associations. MSMCDA [18], a method that integrates shared units and attention mechanisms to fuse similarity and meta-path networks for circRNA–disease association prediction, enhanced by contrastive learning and followed by an MLP classifier.
Although the aforementioned methods have demonstrated excellent performance, there are still two general issues: (1) Current databases contain a relatively limited number of circRNA-disease associations, leading to a data sparsity problem that severely restricts the predicting accuracy. (2) Verified circRNA-disease non-associations are typically difficult to obtain, making it challenging to determine negative samples.
To address these two problems, a novel computational method, namely MVHGCN, is proposed to predict circRNA-disease associations. It consists of four main components. Firstly, MVHGCN computes feature descriptors of circRNA and disease respectively from known circRNA-disease associations. Secondly, it acquires association views with different connectivity patterns from a heterogeneous graph through meta-paths. Then, these diverse connectivity views are fused, and deep information is aggregated through a graph convolutional network to obtain the final representations. Finally, a MLP is employed to obtain the association scores between circRNAs and diseases. To verify the effectiveness of MVHGCN, it is compared to six benchmark methods, including GMNN2CD [16], MLNGCF [19], AE-RF [20], CircWalk [21], KGETCDA [17], and KGRACDA [22]. The experimental results demonstrate that MVHGCN outperforms these methods. Specifically, for the key metric AUC, taking Dataset 1 as an example, MVHGCN achieves the highest value of 99.3%, which is 4.6% higher than GMNN2CD, 29.3% higher than MLNGCF, 19.2% higher than AE-RF, 13.8% higher than CircWalk, 33.2% higher than KGETCDA, and 12.2% higher than KGRACDA, respectively.
Materials and methods
The framework of MVHGCN
The overall workflow of MVHGCN is shown in Fig 1. Initially, multi-source data are integrated to construct a large-scale heterogeneous graph, from which a circRNA-disease association matrix is obtained. Subsequently, four feature matrices, the disease semantic similarity matrix (DSS), the disease GIP kernel similarity matrix (DGS), the circRNA functional similarity matrix (CFS), and the circRNA GIP kernel similarity matrix (CGS), are computed from the circRNA-disease association matrix. These matrices are then fused to obtain the feature descriptor of circRNA and disease. Next, in the heterogeneous graph, different association views are acquired based on the distinct connection patterns between circRNAs and diseases using meta-paths. Thereafter, the association views are aggregated using global and local view aggregation to obtain the final representation. Finally, the prediction layer of an MLP is employed to output association scores.
Constructing feature descriptors of circRNA and disease
To extract high-quality initial features accurately, four matrices (DSS, DGS, CFS, and CGS) were calculated from the circRNA-disease association matrix. These similarity matrices were integrated into a feature descriptor for subsequent analysis.
Disease semantic similarity (DSS).
Based on existing research [23], disease semantic similarity is calculated using Medical Subject Headings (MeSH), which are provided by the National Center for Biotechnology Information (NCBI). In the MeSH database, the association information between diseases is represented by a Directed Acyclic Graph (DAG) denoted as , where node d represents a disease, Ad is the set of ancestor nodes of node d (including node d itself), and Rd is the set of relationships associated with node d. DSS was calculated by two methods. In the first method, the contribution of disease j to disease i was calculated by:
where represents the semantic contribution factor between diseases, which is typically set to 0.5. The semantic value DV for disease i is calculated by:
In DAG, the greater the number of nodes shared between diseases, the higher their similarity. The semantic similarity between disease m and disease n, denoted as , is calculated as:
The second method posits that the number of diseases plays a crucial role in disease contribution, and diseases that appear fewer times in the DAG may be more significant. It is calculated as:
where num(DAGs(j)) is the number of occurrences of disease j in its associated subgraph, and num(diseases) is the total number of diseases in DAGs. Then, the semantic similarity between disease m and disease n, denoted as , is calculated as:
Ultimately, .
Disease GIP kernel similarity (DGS).
Due to the sparsity of disease semantic similarity data, not all diseases have semantic similarity information, which limits the comprehensive representation of disease features. We additionally employes the Gaussian Interaction Profile Kernel Similarity (GIPKS) [16] is additionally employed to compute disease similarities to obtain more comprehensive information. The GIP kernel similarity between disease i and disease j is calculated as:
where represents the bandwidth number, nd denotes the total number of diseases, and A(di) and A(dj) are corresponded to the i-th and j-th columns of the circRNA-disease association matrix, respectively.
CircRNA functional similarity (CFS).
Based on the hypothesis that similar diseases have similar circRNAs [17], the CFS can be calculated as:
where ci and cj represent the i-th and j-th circRNA, respectively, and Di and Dj denote the sets of diseases associated with these two circRNA.
CircRNA GIP kernel similarity (CGS).
Similar to DGS, the CGS can be calculated as:
where represents the number of bandwidths, nc denotes the total number of diseases, and A(ci) and A(cj) respectively indicate the i-th and j-th rows of the circRNA-disease association matrix.
Constructing feature descriptors.
If there exists a DSS between two diseases, the disease similarity is defined as the DSS between these two diseases. If there is no semantic similarity, it is defined as their DGS, which is calculated as:
The definition of circRNA similarity is similar to that of disease similarity. If two circRNAs exhibit functional similarity, their circRNA similarity is defined as the CFS between them. If there is no functional similarity, it is defined as their CGS, which is calculated as:
Deep Autoencoder [24] is an unsupervised learning model based on neural networks, commonly used for tasks such as dimensionality reduction, feature extraction, and data denoising. After calculating the disease similarity and circRNA similarity, a deep autoencoder is utilized to unify their dimensionality to obtain initial feature descriptors for diseases and circRNAs, respectively.
Constructing associated views
Meta-paths [25] are generally used to describe the relationships between nodes of different types through which potential information among different nodes in heterogeneous graphs can be mined. A meta-path P is defined as a path , abbreviated as
, which describes the composite relationship
between node types A1 and Al + 1. A and R represent the sets of node and edge types, respectively, in the heterogeneous graph, and
denotes the composition operator for relationships.
To accurately obtain the association information between circRNAs and diseases, meta-paths are utilized to extract various connections between circRNAs and diseases from the heterogeneous graphs. These meta-paths are used to construct different association views within the heterogeneous graph, allowing for a multi-view analysis of the interactions between circRNAs and diseases. Each association view is a bipartite graph that contains only circRNAs and diseases, and simultaneously extracts the structural information of the heterogeneous graph. For MVHGCN, 10 meta-paths are selected to analyze the association relationships, which are detailed as:
Given the complexity of the constructed heterogeneous graph structure and the sharp increase in the number of associations between circRNAs and diseases as the length of a meta-path increases, DPRel [26], a meta-path-based relevance measurement method specifically designed for heterogeneous networks, is used in MVHGCN to calculate the association scores between circRNAs and diseases. This approach effectively filters out circRNA-disease pairs with low association scores, thereby enhancing the accuracy and effectiveness of the association analysis. Traditional methods with homogeneous networks often fail to preserve diverse semantic information. However, DPRel not only retains such information but also effectively computes the relevance between objects along the paths, so that it is applicable to relevance measurement between nodes of the same type and different types. Specifically, given a meta-path , the DPRel relevance between the source object
and the target object
was defined as:
where represents the number of paths connecting ali and b(i + 1)j, deg(ali) and deg(b(i + 1)j) are the degrees of nodes ali and b(i + 1)j in the heterogeneous graph, respectively.
Associated views aggregation
To embed high-order features from multiple correlated views into node embeddings, an Associated View Aggregation Mechanism (AVAM) is constructed. It consists of two key components: Global View Aggregation (GVA) and Local View Aggregation (LVA). The GVA is built on the deep aggregation mechanism [27]. LVA is established to focus on the information aggregation among nodes within each view, further enhancing the richness of learned embeddings.
Global View Aggregation (GVA).
As a composite view of multiple perspectives, GVA reveals different correlation patterns at a deeper level. By adaptively learning the importance of each correlation view, GVA achieves better performance on information aggregation. Based on the generated correlation views, GVA distinguishes their importance and weights the correlation views by a set of learnable weighting parameters :
where N represents the number of associated views, and is the adjacency matrix of the i-th associated views. Subsequently, the aggregated matrix is input into a simplified graph convolutional network for convolution operations, during which no nonlinear activation functions are employed:
where represents the node embedding matrix, and
denotes a learnable weight matrix. Consequently, a single-layer GCN [28] is used to effectively learn node representations that incorporate interaction information from all associated views. To capture deeper relational information, this concept is extended to l layers:
Considering the over-smoothing problem in graph neural networks, only two convolutional layers are used in MVHGCN. To fully capture all interaction information across various depth-related views, the outputs of each layer are fused to obtain the final node representations:
Local View Aggregation (LVA).
LVA learns the importance of each associated view separately to generate respective node representations, so that it could focus on their inner information among nodes within each view. Specifically, every view is input to a simplified graph convoluational network for convolution operations:
where represents the node embedding matrix for the i-th view, and
denotes the learnable weight matrix for the i-th view. Additionally, all parameters in the GCN are shared across all views.
The final node representations are obtained by averaging the last layer outputs of each view with Eglobal, as shown below:
where represents the embedding of the i–th circRNA in the m-th view.
Predicting associations by MLP
Based on the aforementioned process, the final representations of circRNAs and diseases are obtained by connecting Eci and Edj as the embedding representations for circRNA ci and disease dj, before being input to an MLP to obtain their association scores:
where , W, and b represent the activation function, learnable weight matrix, and bias, respectively. ReLU is chosen as the activation function for hidden layers, and the sigmoid function is chosen for the output layer.
Contrastive learning has demonstrated its superiority in various graph learning tasks. Therefore,InfoNCE [29] (Information Noise Contrastive Estimation) is a loss function specifically designed for contrastive learning tasks. Its primary goal is to maximize the mutual information between positive pairs while minimizing it for negative pairs. In this study, this means enhancing the representation of circRNAs and diseases that are truly associated while suppressing those that are not.The InfoNCE loss function is defined as follows:
where N represents the number of circRNAs, Pi and Ni are the positive and negative sample sets for the i-th circRNA, respectively, and is set to 0.1 by default.
Evaluation metrics
Four evaluation criteria, Accuracy (ACC), Area Under the Receiver Operating Characteristic Curve (AUC), Area Under the Precision-Recall Curve (AUPR), and F1-score, are used to evaluate the performance of MVHGCN and benchmark methods. These criteria are widely used to evaluate machine learning models for classification because of their comprehensive reflection of accuracy, robustness, and stability. In the context of machine learning, TP represents the number of true positive samples, FP denotes the number of false positive samples, TN indicates the number of true negative samples, and FN stands for the number of false negative samples. AUC represents the area under a ROC curve plotted by the False Positive Rate (FPR) against the True Positive Rate (TPR). AUPR denotes the area under a precision-recall (PR) curve depicting the relationship between Precision (Pre) and Recall (Rec). These evaluation criteria are defined as:
Results
Datasets
MVHGCN is evaluated on three heterogeneous datasets integrated from 10 databases to validate its effectiveness. These heterogeneous datasets are built by five different associations (circRNA-disease associations, circRNA-miRNA associations, lncRNA-disease associations, miRNA-disease associations, and lncRNA-miRNA associations). The primary distinction among the three datasets lies in the categorization of disease types: Dataset 1 includes only cancer-related diseases, Dataset 2 includes only non-cancer-related diseases, and Dataset 3 encompasses both cancer and non-cancer-related diseases. Specifically for Dataset 1, circRNA-disease associations are obtained from Circ2Disease [30], CircR2Disease [31], circR2Cancer [32], and Lnc2Cancer [33]. circRNA-miRNA associations are obtained from circR2Cancer and circBank [34]. lncRNA-disease associations are constructed from Lnc2Cancer and LncRNADisease. miRNA-disease associations are generated from circR2Cancer, Circ2Disease, HMDD [35], mir2disease [36], and miRCancer [37]. lncRNA-miRNA associations come from lncRNASNP2 [38]. Similarly, Dataset 2 and Dataset 3 are built using the same strategy. The difference from Dataset 1 lies in the source of the circRNA-disease associations: Dataset 2 obtains associations between circRNAs and non-cancer diseases from Circ2Disease, CircR2Disease, circR2Cancer, and Lnc2Cancer, while Dataset 3 derives its circRNA-disease associations from LncRNADisease [39]. The number of circRNA, diseases, miRNA, lncRNA, and their five associations is shown in Table 1.
All three constructed datasets contain an equal number of positive and negative samples. Based on the assumption that similar circRNAs often have similar diseases, we use a stratified filtering method while constructing negative samples. Specifically, for a given circRNA i, we first determine whether other circRNAs shared the same associated diseases based on different association views. If multiple circRNAs share common disease associations with circRNA i, diseases not linked to these circRNAs are selected and paired with circRNA i to generate negative samples. However, constructing negative samples by the above method usually results in a small number of negative samples. Therefore, the negative sample construction strategy proposed by Wei et al. [12] is utilized to balance the number of positive and negative samples. Specifically, according to the circRNA similarity CS, k dissimilar circRNAs are selected to form a set: . Then all diseases associated with the circRNAs in CSi are selected as a set:
, where n is the number of associated diseases. Finally, diseases from CD are used to construct negative samples.
Performance of MVHGCN on three datasets
The prediction performance of MVHGCN is evalusted by 5-fold cross-validation on the three datasets, respectively. The ROC curves and PR curves obtained for each fold are shown in Fig 2. The ACC, AUC, AUPR, and F1-score of each fold are shown in Table 2. For the key metrics AUC and AUPR, MVHGCN exhibites relatively stable performance with high average values across three datasets, with an average AUC value of 99.42% and an average AUPR value of 99.28% on Dataset 1. However, for the metrics ACC and F1-score, the model showes fluctuations, which might be due to the limited data, as illustrated in Fig 2(c).
(a) ROC curves, (b) PR curves, (c) Box plots.
Comparison with six benchmark methods
To further validate the effectiveness of MVHGCN, it is compared with six benchmark methods, GMNN2CD [16], MLNGCF [19], AE-RF [20], CircWalk [21], KGETCDA [17], and KGRACDA [22]. AUC is selected as the primary comparison metric, and the ROC curves for each method are shown in Fig 3, with other metric values presented in Table 3. Across three datasets, MVHGCN achieves the highest AUC values of 0.993, 0.986, and 0.984, respectively, indicating that MVHGCN exhibits excellent generalization ability and classification performance across different datasets. The AUC values of the other benchmark methods are all significantly lower than those of MVHGCN, particularly on Dataset 1 and Dataset 2, where the performance gap is most pronounced. Although GMNN2CD showes relatively stable performance across all datasets, with AUC values around 0.881, it still does not perform as well as MVHGCN. MVHGCN achieves average ACC, AUPR, and F1-score values of 0.955, 0.987, and 0.955 across the three datasets, outperforming the second-best existing method by 0.109, 0.18, and 0.18, respectively. The performance of the other methods achieve relatively lower ACC, AUC, AUPR and F1-score values, further demonstrating the superior discriminative ability of MVHGCN in various datasets. This disparity highlights that MVHGCN not only possesses stronger robustness but also demonstrates the effectiveness of MVHGCN in alleviating data sparsity through multi-view feature extraction and improving performance via a reasonable negative sampling strategy.
Effectiveness of associated views in MVHGCN
To investigate the impact of different views on model prediction performance, 11 views are evaluated by 5-fold cross-validation on the three datasets, and the results are shown in Fig 4. These views were defined as follows: ,
,
,
,
,
,
,
,
,
. Taking Dataset1 as an example, the performance of view4, view5, view8, and view9 did not significantly improve after incorporating additional meta-path views on top of Pcd, indicating that not all views enhance prediction performance. This may be due to the introduction of irrelevant information or noise by certain views. Ultimately, we select effective meta-path views to construct
. View11 demonstrated excellent performance on all three datasets.
Performance comparison of different loss functions in MVHGCN
To evaluate the impact of loss functions on model performance, a 5-fold cross-validation is used on the three datasets, comparing InfoNCE with BPR [40], Hinge [41],and BCE [42]. The results are shown in Fig 5. InfoNCE demonstrates particularly better performance in terms of AUC and AUPR metrics, showcasing exceptional classification capabilities and the ability to distinguish between positive and negative samples, especially on Dataset1 and Dataset3. Although Hinge does not perform as well as InfoNCE, it still showes commendable results. BCE exhibits the most balanced performance on Dataset1, achieving the highest values for accuracy and F1-score. In contrast, BPR’s overall performance is inferior to that of the other loss functions, particularly on Dataset1 and Dataset3, where its accuracy and AUC show notable gaps. Overall, InfoNCE outperforms the other loss functions.
Performance comparison of different GNNs in MVHGCN
To comprehensively evaluate the effectiveness of different Graph Neural Networks (GNNs), we replaces the GCN [28] in MVHGCN with GAT [43], LGCN [44], and GraphSAGE [45] respectively for comparison. Fig 6 presents the AUC results obtained through 5-fold cross-validation, and Fig 7 shows their ROC curves.
We observe that GCN and GAT exhibit similar performance, but because of GAT’s longer computational time, GCN is chosen as the final model. Compared to other models, GCN exhibits higher performance across all metrics. Taking Dataset 1 as an example, GCN achieves 0.8% higher ACC than LGCN and 0.9% higher than GraphSAGE. In terms of AUC, GCN demonstrates a 2.1% higher value than LGCN and 0.9% higher than GraphSAGE. For AUPR, GCN showes 1.5% higher values than LGCN and 1.3% higher than GraphSAGE. Finally, for the F1-score, GCN achieves 1.9% higher than LGCN and 1.7% higher than GraphSAGE.
Parameter analysis
The impact of parameters in MVHGCN on performance are explored in this section, with a particular focus on the temperature parameter in InfoNCE and the setting of embedding dimensionality. The temperature parameter is used to adjust the sensitivity of similarity measurements in contrastive learning. We evaluate the effects of different temperatures (t) and dimensionality on the AUC value through 5-fold cross-validation on the three datasets, shown in Fig 8. Results show a slight performance decline as temperature (t) increases. For Dataset 1, performance improves significantly when dimensionality increases from 64 to 256 but plateaus or slightly declines beyond 512. Similar trends are observed in Dataset 2 and Dataset 3, where performance gains diminish at higher dimensionality. Overall, moderate dimensionality (256 or 512) achieves the best balance. Thus, the temperature is set to 0.1, and the embedding dimensionality is set to 256 in MVHGCN.
Discussion
MVHGCN improves the accuracy by alleviating data sparsity and precisely selecting negative samples. Its methodology can be summarized into three key points: Firstly, MVHGCN employes a meta-path strategy to extract multiple relational views from heterogeneous graphs, capturing the diverse associations between circRNAs and diseases. This strategy not only enriches the associations between circRNAs and diseases but also improves the accuracy of negative sample selection. Secondly, by aggregating different views through graph convolutional networks, MVHGCN integrates various association patterns into more expressive embedding representations. Finally, in constructing and utilizing multiple associations between circRNAs and diseases, MVHGCN focuses on the interaction effects between different connection types on the target associations, ensuring that the model can effectively capture and express these complex interaction features.
To further validate the predictive performance of the model, we scored circRNA-disease pairs on Dataset 1 and confirmed the predictions by searching the literature in the PubMed database. Table 4 and Table 5 list the validation literature. Colorectal cancer (CRC) and hepatocellular carcinoma (HC), as malignant tumors with high global incidence and mortality rates, have been demonstrated in previous studies to involve circRNAs playing critical roles in their occurrence and progression [46,47]. Therefore, we selected colorectal cancer and hepatocellular carcinoma as validation targets for evaluating the predictive performance. The results showed that among the top 10 candidate circRNAs with the highest scores, 5 have been validated in the literature.
Although the results are satisfactory, the method still has some limitations. On one hand, due to the inconsistent naming conventions, expression patterns, and descriptive focuses across different circRNA databases, it is still challenging to achieve uniformity, which affects the model’s generalization capability. On the other hand, the proposed method relies on known data to mine unknown associations and is unable to identify new associations between novel circRNAs and diseases. Additionally, although the method can uncover unknown associations, it lacks the ability to interpret their biological significance. Therefore, in future research, we will explore the use of representative circRNA databases and integrate RNA sequences, gene expression information, and functional roles to construct a regulatory network with greater biological meaning.
Conclusion
CircRNAs have played a significant role in the diagnosis of various diseases, particularly in cancers, neurological disorders, and cardiovascular diseases. Accurately predicting the potential associations between circRNAs and diseases offers new perspectives for disease diagnosis and treatment strategies. However, existing methods have faced challenges in predicting circRNA-disease associations because of data sparsity and the difficulty in identifying negative samples. To address these issues, a novel prediction model, MVHGCN, is proposed based on multi-view and graph convolutional networks to predict potential circRNA-disease associations. Experimental results on the three datasets demonstrate that MHVGCN is significantly effective in predicting circRNA-disease associations, providing strong support for related disease research and clinical applications.
References
- 1. Gruner H, Cortés-López M, Cooper DA, Bauer M, Miura P. CircRNA accumulation in the aging mouse brain. Sci Rep. 2016;6:38907. pmid:27958329
- 2. Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013;495(7441):333–8. pmid:23446348
- 3. Niu M, Wang C, Chen Y, Zou Q, Qi R, Xu L. CircRNA identification and feature interpretability analysis. BMC Biol. 2024;22(1):44. pmid:38408987
- 4. Cao C, Li M, Wang C, Xu L, Zou Q, Wang Y, et al. DGCLCMI: a deep graph collaboration learning method to predict circRNA-miRNA interactions. BMC Biol. 2025;23(1):104. pmid:40264118
- 5. Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK, et al. Natural RNA circles function as efficient microRNA sponges. Nature. 2013;495(7441):384–8. pmid:23446346
- 6. Cao C, Wang C, Dai Q, Zou Q, Wang T. CRBPSA: CircRNA-RBP interaction sites identification using sequence structural attention model. BMC Biol. 2024;22(1):260. pmid:39543602
- 7. Ashwal-Fluss R, Meyer M, Pamudurti NR, Ivanov A, Bartok O, Hanan M, et al. circRNA biogenesis competes with pre-mRNA splicing. Mol Cell. 2014;56(1):55–66. pmid:25242144
- 8. Tian Y, Zou Q, Wang C, Jia C. Mamlcda: a meta-learning model for predicting circrna-disease association based on maml combined with cnn. IEEE J Biomed Health Inf. 2024.
- 9. Niu M, Chen Y, Wang C, Zou Q, Xu L. Computational approaches for circRNA-disease association prediction: a review. Front Comput Sci. 2025;19(4):194904.
- 10. Fan C, Lei X, Wu F-X. Prediction of CircRNA-disease associations using KATZ model based on heterogeneous networks. Int J Biol Sci. 2018;14(14):1950–9. pmid:30585259
- 11. Lei X, Bian C. Integrating random walk with restart and k-Nearest Neighbor to identify novel circRNA-disease association. Sci Rep. 2020;10(1):1943. pmid:32029856
- 12. Wei H, Liu B. iCircDA-MF: identification of circRNA-disease associations based on matrix factorization. Brief Bioinform. 2020;21(4):1356–67. pmid:31197324
- 13. Wang L, You ZH, Zhou X, Yan X, Li HY, Huang YA. NMFCDA: combining randomization-based neural network with non-negative matrix factorization for predicting CircRNA-disease association. Appl Soft Comput. 2021;110:107629.
- 14. Niu M, Wang C, Zhang Z, Zou Q. A computational model of circRNA-associated diseases based on a graph neural network: prediction and case studies for follow-up experimental validation. BMC Biol. 2024;22(1):24. pmid:38281919
- 15. Lan W, Dong Y, Chen Q, Zheng R, Liu J, Pan Y. KGANCDA: predicting circRNA-disease associations based on knowledge graph attention network. Brief Bioinform. 2022;23(1):bbab494.
- 16. Niu M, Zou Q, Wang C. GMNN2CD: identification of circRNA-disease associations based on variational inference and graph Markov neural networks. Bioinformatics. 2022;38(8):2246–53. pmid:35157027
- 17. Wu J, Ning Z, Ding Y, Wang Y, Peng Q, Fu L. KGETCDA: an efficient representation learning framework based on knowledge graph encoder from transformer for predicting circRNA-disease associations. Brief Bioinform. 2023;24(5):bbad292. pmid:37587836
- 18. Zhang X, Zou Q, Niu M, Wang C. Predicting circRNA-disease associations with shared units and multi-channel attention mechanisms. Bioinformatics. 2025;41(3):btaf088. pmid:40045181
- 19. Wu Q, Deng Z, Zhang W, Pan X, Choi K-S, Zuo Y, et al. MLNGCF: circRNA-disease associations prediction with multilayer attention neural graph-based collaborative filtering. Bioinformatics. 2023;39(8):btad499. pmid:37561093
- 20. Deepthi K, Jereesh AS. Inferring potential CircRNA-disease associations via deep autoencoder-based classification. Mol Diagn Ther. 2021;25(1):87–97. pmid:33156515
- 21. Kouhsar M, Kashaninia E, Mardani B, Rabiee HR. CircWalk: a novel approach to predict CircRNA-disease association based on heterogeneous network representation learning. BMC Bioinformatics. 2022;23(1):331. pmid:35953785
- 22. Wang Y, Ma M, Xie Y, Peng Q, Lyu H, Sun H, et al. KGRACDA: a model based on knowledge graph from recursion and attention aggregation for CircRNA-disease association prediction. IEEE/ACM Trans Comput Biol Bioinform. 2024.
- 23. Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010;26(13):1644–50. pmid:20439255
- 24.
Chicco D, Sadowski P, Baldi P. Deep autoencoder neural networks for gene ontology annotation predictions. In: Proceedings of the 5th ACM conference on bioinformatics, computational biology, and health informatics; 2014. p. 533–40. https://doi.org/10.1145/2649387.2649442
- 25.
Fu X, Zhang J, Meng Z, King I. Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In: Proceedings of the web conference 2020; 2020. p. 2331–41.
- 26. Gupta M, Kumar P, Bhasker B. DPRel: a meta-path based relevance measure for mining heterogeneous networks. Inf Syst Front. 2019;21:979–95.
- 27.
Fu C, Zheng G, Huang C, Yu Y, Dong J. Multiplex heterogeneous graph neural network with behavior pattern modeling. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2023. p. 482–94.
- 28. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv preprint 2016. https://arxiv.org/abs/1609.02907
- 29. Oord Avd, Li Y, Vinyals O. Representation learning with contrastive predictive coding. arXiv preprint 2018. https://arxiv.org/abs/1807.03748
- 30. Yao D, Zhang L, Zheng M, Sun X, Lu Y, Liu P. Circ2Disease: a manually curated database of experimentally validated circRNAs in human disease. Sci Rep. 2018;8(1):11018. pmid:30030469
- 31. Fan C, Lei X, Fang Z, Jiang Q, Wu F-X. CircR2Disease: a manually curated database for experimentally supported circular RNAs associated with various diseases. Database (Oxford). 2018;2018:bay044. pmid:29741596
- 32. Lan W, Zhu M, Chen Q, Chen B, Liu J, Li M, et al. CircR2Cancer: a manually curated database of associations between circRNAs and cancers. Database (Oxford). 2020;2020:baaa085. pmid:33181824
- 33. Gao Y, Shang S, Guo S, Li X, Zhou H, Liu H, et al. Lnc2Cancer 3.0: an updated resource for experimentally supported lncRNA/circRNA cancer associations and web tools based on RNA-seq and scRNA-seq data. Nucleic Acids Res. 2021;49(D1):D1251–8. pmid:33219685
- 34. Liu M, Wang Q, Shen J, Yang BB, Ding X. Circbank: a comprehensive database for circRNA with standard nomenclature. RNA Biol. 2019;16(7):899–905. pmid:31023147
- 35. Huang Z, Shi J, Gao Y, Cui C, Zhang S, Li J, et al. HMDD v3.0: a database for experimentally supported human microRNA-disease associations. Nucleic Acids Res. 2019;47(D1):D1013–7. pmid:30364956
- 36. Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, et al. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009;37(Database issue):D98-104. pmid:18927107
- 37. Xie B, Ding Q, Han H, Wu D. miRCancer: a microRNA-cancer association database constructed by text mining on literature. Bioinformatics. 2013;29(5):638–44. pmid:23325619
- 38. Miao Y-R, Liu W, Zhang Q, Guo A-Y. lncRNASNP2: an updated database of functional SNPs and mutations in human and mouse lncRNAs. Nucleic Acids Res. 2018;46(D1):D276–80. pmid:29077939
- 39. Bao Z, Yang Z, Huang Z, Zhou Y, Cui Q, Dong D. LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 2019;47(D1):D1034–7. pmid:30285109
- 40. Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint 2012. https://arxiv.org/abs/1205.2618
- 41. Jin J, Fu K, Zhang C. Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Trans Intell Transp Syst. 2014;15(5):1991–2000.
- 42. Cai Z, Liu S, Wang G, Ge Z, Zhang X, Huang D. Align-detr: improving detr with simple iou-aware bce loss. arXiv preprint 2023. https://arxiv.org/abs/2304.07527
- 43. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. arXiv preprint 2017. https://arxiv.org/abs/1710.10903
- 44.
Gao H, Wang Z, Ji S. Large-scale learnable graph convolutional networks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018. p. 1416–24.
- 45. Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. Adv Neural Inf Process Syst. 2017;30.
- 46. Yu J, Xu Q-G, Wang Z-G, Yang Y, Zhang L, Ma J-Z, et al. Circular RNA cSMARCA5 inhibits growth and metastasis in hepatocellular carcinoma. J Hepatol. 2018;68(6):1214–27. pmid:29378234
- 47. Long F, Lin Z, Li L, Ma M, Lu Z, Jing L, et al. Comprehensive landscape and future perspectives of circular RNAs in colorectal cancer. Mol Cancer. 2021;20(1):26. pmid:33536039