AutoEdge-CCP: A novel approach for predicting cancer-associated circRNAs and drugs based on automated edge embedding

The unique expression patterns of circRNAs linked to the advancement and prognosis of cancer underscore their considerable potential as valuable biomarkers. Repurposing existing drugs for new indications can significantly reduce the cost of cancer treatment. Computational prediction of circRNA-cancer and drug-cancer relationships is crucial for precise cancer therapy. However, prior computational methods fail to analyze the interaction between circRNAs, drugs, and cancer at the systematic level. It is essential to propose a method that uncover more valuable information for achieving cancer-centered multi-association prediction. In this paper, we present a novel computational method, AutoEdge-CCP, to unveil cancer-associated circRNAs and drugs. We abstract the complex relationships between circRNAs, drugs, and cancer into a multi-source heterogeneous network. In this network, each molecule is represented by two types information, one is the intrinsic attribute information of molecular features, and the other is the link information explicitly modeled by autoGNN, which searches information from both intra-layer and inter-layer of message passing neural network. The significant performance on multi-scenario applications and case studies establishes AutoEdge-CCP as a potent and promising association prediction tool.


∑ 𝐷 𝑑 𝑖 (𝑡)
∈   +∑   () ∈ (1) Where D d (t) represent the semantic contribution value of cancer t to cancer d in DAG d .μ = 0.5 is the semantic contribution factor [3]. Since a directed acyclic graph cannot provide a complete coverage of all diseases in the MeSH database, we opted for the GIP similarity approach to obtain a more comprehensive measure of disease similarity.We calculate the GIP similarity of cancers using molecular landscape assumption that cancers with similar pathological characteristics are likely to be linked with similar circRNAs or drugs, and vice versa [4].In the heterogeneous interaction network, there are three associations: circRNA-cancer, circRNA-drug, and drug-cancer.Therefore, we extend the GIP similarity model of each type of nodes based on two association matrices.Base on circRNA-cancer adjacent matrix, the GIP similarity  1 (  ,   ) between cancers d i and d j can be calculated using following formulations: Where  1 (  ) represents the interaction profile used to encode the association of cancer   with each circRNA, corresponding to the i-th column binary vector in circRNA-cancer adjacent matrix.θ c 1 denotes bandwidth parameter of GIP, and n is the number of circRNAs.Base on drug-cancer adjacent matrix, the GIP similarity  2 (  ,   ) between cancers d i and d j can be calculated using following steps: Where V 2 (d i ) and θ d 2 have the same meaning as V 1 (d i ) and θ d 1 ，m is the number of drugs.
Finally, to fully exploit the advantages of all available information, we obtain the cancer attribute feature representation as follows.

𝑆(𝑑
Where D i and D j are the cancer sets related to circRNA   and circRNA c j , respectively.Based on the same principle as the cancer GIP similarity, we calculate the GIP similarity GC 1 (c i , c j ) and GC 2 (c i , c j ) of the circRNAs based on the circRNA-cancer and circRNA-drug adjacent matrices.
Thus, the fusion circRNA attribute feature representation can be expressed as follows:

Drug-Drug Similarity
Drug attribute feature representation is obtained by combining drug chemical structures similarity and GIP similarity.The drug chemical structures similarity is determined using RDKit [7], a software package designed for cheminformatics and bioinformatics applications, based on SMILES (Canonical Simplified Molecular Input Line-Entry System) [8] of drugs.Firstly, the standard SMILES sequences of all drugs are retrieved from the DrugBank [9] Atlas, which are utilized to encode the chemical structure and spatial configuration of the drugs.The RDKit tool is then employed to transform the SMILES format into a numerical representation of drug fingerprints.Subsequently, the Dice coefficient [10] is used to measure the similarity between drug fingerprints, which is calculated by the following formula: where |f i | and |f j | represent the number of fingerprint bits of drug g  and g j respectively, and |f i ∩ f j | indicates the number of common fingerprint bits.Similar to circRNA, we construct the GIP similarity GG 1 (g i , g j ) and GG 2 (g i , g j ) of the drugs based on the circRNA-drug and drug-cancer adjacent matrices.Thus, we construct the drug attribute feature representation as follows: [5,6]NA attribute feature representation is accomplished by combining circRNA GIP and functional similarity.We construct the circRNA function similarity model under the assumption if two circRNAs are shared by more semantically similar disease groups, the more functionally similar they are[5,6].Thus, the function similarity between circRNA   and circRNA c j can be ,   ) = { (  ,   ) ,    ,   ℎ ℎ   ( 1 (  ,   ) +  2 (  ,   )/2 ℎ (12)