iCircDA-NEAE: Accelerated attribute network embedding and dynamic convolutional autoencoder for circRNA-disease associations prediction

Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.


Introduction
Circular RNAs (circRNAs) are a class of non-coding RNA characterized by a covalently closed-loop structure generated through a special type of alternative splicing termed backsplicing.Given that circRNAs lack free ends and are thus relatively stable, they are abundant in the eukaryotic transcriptomes.It has been shown that circRNAs are involved in various life activities of organisms, including functioning as microRNA (miRNA) sponges [1], regulating alternative splicing [2], modulating the expression of parental genes [3], etc.In addition, accumulating evidence suggests that circRNAs affect many diseases, such as glioma [4], breast cancer [5], and liver cancer [6].Therefore, the study of circRNAs is crucial for disease diagnosis and treatment.
At present, identifying circRNA-disease associations is appealing to find potential biomarkers and understand the diagnosis and treatment of diseases.However, the circRNA-disease associations are very complicated and remain still obscure.With the development of sequencing and analysis technology, various biological experiments have emerged to identify cir-cRNA-disease associations [7][8][9].However, biological experiments are generally costly and labor-intensive.The experimentally supported circRNA-disease association databases (cir-c2Disease [10], circRNADisease [11], circR2Disease [12], circ2Traits [13], circFunbase [14]) provide an opportunity to develop computational methods for circRNA-disease association identification.
Recently, researchers have proposed many deep learning-based methods to predict cir-cRNA-disease associations.For example, GCNCDA [15], one of the most well-verified DLbased algorithms, applied graph convolutional network to predict circRNA-disease associations.ASAECDA [16], another impressive DL-based algorithm, calculated weight values of the links between circRNAs and diseases based on graph embedding and stacked autoencoder.GATCDA [17] used graph attention network to predict scores for unknown circRNA-disease associations.IMS-CDA [18] identified potential circRNA-disease associations by incorporating multi-source similarity information into a deep stacked autoencoder model.iCDA-CGR [19] used chaos game representation technology to discover the associations between cir-cRNAs and diseases.RNMFLP predicted circRNA-disease associations based on robust

Competing interests:
The authors have declared that no competing interests exist.
nonnegative matrix factorization and label propagation [20].iGRLCDA identified circRNAdisease association based on graph representation learning [21].These methods achieved impressive prediction performance.However, we found that these methods suffer from two major drawbacks.The first is these methods underutilize biometric information in the data.Second, the features used by these methods are not outstanding to represent association characteristics between circRNAs and diseases.
In this study, we developed a novel deep learning model for identifying Circrna-Disease Associations based on accelerated attribute Network Embedding and dynamic convolutional AutoEncoder (iCircDA-NEAE).The proposed model iCircDA-NEAE can (i) make the most of the bio-metric information in the data (ii) enhance the feature extraction capability of the model by using multiple feature extraction methods, and (iii) predict circRNA-disease associations accurately.Specifically, (i) circRNA-disease association data were collected from the cir-cR2Disease database; (ii) disease semantic similarity, Gaussian interaction kernel (GIP), circRNA expression profile similarity, and Jaccard similarity were used to measure the biometric information in the data, then multisource information fusion descriptor was constructed; (iii) accelerated attribute network embedding (AANE) extracts features from the descriptor data; (IV) dynamic convolutional autoencoder (DCAE) extracts hidden features from data; (V) random forest classifier used hidden features to predict circRNA-disease association.The schematic overview of iCircDA-NEAE framework is shown in Fig 1 .5-fold and 10-fold crossvalidation on training data and test data experiments were used to validate the model performance.Experimental results show that iCircDA-NEAE outperforms other competing methods significantly.Furthermore, according to the relevant literature, we observe that novel cir-cRNA-disease associations predicted by iCircDA-NEAE are potential associations.

Hyperparameter Selection of iCircDA-NEAE
In a random forest classifier, max_feature determines the number of features in each decision tree.Too small max_feature may contain incomplete feature information, while too large max_feature led to overfitting problems.In this section, the important hyperparameter max_feature was investigated experimentally, whereas other hyperparameters were set to default values.
The value of max_feature ranges from 0.1 to 0.5 [22].As shown in Fig 2, the AUC value of iCircDA-NEAE is the highest when max_feature is set to 0.2.Therefore, in this experiment, we set max_feature to 0.2.

Comparison with different classifiers
In this section, we compared iCircDA-NEAE with traditional machine learning algorithms as well as common deep learning algorithms, including SVM (Support Vector Machine) [23], RF (Rotation Forest) classifier [24], DNN (Deep Neural Network) [25] and XGBoost [26].To make the results comparable, we only replaced the classifier in the model with the classifier that need to be compared.The detailed parameters of all classifiers were presented in Table 2.
We compared the performance of iCircDA-NEAE with the five classifiers by using benchmark dataset and two independent datasets (circR-NAdisease and circ2Disease datasets).The ROC curves on the three datasets were shown in  3.As shown in Table 3, iCirc-DA-NEAE with random forest classifier outperforms other classifiers on all evaluation metrics.

Comparison of different datasets
In this section, the model performance was evaluated by using two independent datasets (cir-cRNAdisease dataset and circ2Disease dataset) with 5-fold and 10-fold cross-validation.As shown in Fig 4, the AUC values of iCircDA-NEAE on the circRNAdisease and circ2Disease datasets are 0.8809 and 0.8505 respectively.The 5-fold cross-validation experimental results on the circRNAdisease and circ2Disease datasets were presented in Table 4.For the circRNAdisease dataset, the ACC, Sen, F1 and MCC of iCircDA-NEAE are 0.8682, 0.8335, 0.8327 and 0.6613, respectively.For the circ2Disease dataset, the ACC, Sen, F1 and MCC of iCircDA-NEAE are 0.8487, 0.7325, 0.7170 and 0.4327, respectively.The 10-fold cross-validation  experimental results were presented in S2 and S3 Tables, respectively.For circRNAdisease dataset, the ACC, Sen, F1, MCC and AUC of iCircDA-NEAE are 0.8735, 0.8413, 0.8274, 0.6635 and 0.8962, respectively.For the circ2Disease dataset, the ACC, Sen, F1, MCC and AUC of iCircDA-NEAE are 0.8537, 0.7530, 0.7074, 0.4341 and 0.8575, respectively.These results suggest that iCircDA-NEAE can achieve good prediction performance on several important datasets.
In terms of features, although these state-of-the-art methods have used a variety of feature information, they can consider more biometric information.Our proposed iCircDA-NEAE considers both circRNA expression profile similarity and Jaccard similarity.To the best of our knowledge, we are the first to use both circRNA expression profile similarity and Jaccard similarity to predict circRNA-disease associations.Furthermore, our method performs multisource feature fusion, which can measure the correlation of multiple feature information and fuse this information into a unified information identifier.At the same time, features without redundant information can effectively improve model performance.
In terms of models, these state-of-the-art methods used traditional deep learning or machine learning algorithms.iCDA-CGR used chaos game representation (CGR) technology to quantify the nonlinear relationship of circRNA sequences.However, the model did not deal with redundant information resulting in poor predictive performance.IMS-CDA and ASAECDA are two deep learning methods based on stacked autoencoder (SAE), which use SAE to extract features from multi-source information.Compared with SAE, our proposed DCAE can capture high-level representations of the data.GCNCDA is a GCN (Graph Convolutional Networks)-based prediction method, and GATCDA is a GTN (Graph Attention  Network)-based prediction method.Compared with these two methods, iCircDA-NEAE incorporates the advantages of ANNE and DCAE, which not only effectively integrates multisource information, but also effectively capture hidden high-level information of data.

Case studies
In this section, we applied iCircDA-NEAE to the benchmark dataset circR2Disease for predicting novel potential circRNA-disease associations.We sorted all unconfirmed circRNA-disease associations in descending order based on their prediction scores.The higher the score, the  greater the likelihood of a circRNA-disease association.We selected the top 20 circRNA-disease associations (as shown in Table 5), 17 of which have been confirmed by different databases and literature.For example, hsa_circ_0004214 is highly upregulated in breast cancer and promotes tumorigenesis [27]; hsa_circ_0001785 acts as a diagnostic biomarker in breast cancer treatment [28]; and hsa_circ_0004277 is considered as a potential diagnostic marker and therapeutic target for acute myeloid leukemia [29].The three unconfirmed circRNA-disease associations are hsa_circ_0046701-lung cancer, hsa_circ_0037911-pancreatic cancer, and hsa_circ_0005836-colorectal cancer.hsa_circ_0046701 promotes carcinogenesis by increasing the expression of ITGB8 in glioma [30], and the expression level of ITGB8 has significantly upregulated in lung cancer tissues compared with normal tissues [31].These pieces of evidence suggest that hsa_-circ_0046701 may serve as a potential biomarker in lung cancer.miRNA-637 suppresses tumorigenesis in pancreatic ductal adenocarcinoma cells [32].In essential hypertension, has-circ-0037911 was found to suppress miR-637 activity by acting as a sponge [33].These results show that has-circ-0037911 may promote pancreatic ductal adenocarcinoma by inhibiting miR-637 activity.In pulmonary tuberculosis, hsa_circ_0005836 is related to the regulation of the mTOR signaling pathway [34].The mTOR signaling pathway is a target for colorectal cancer therapy [35].These studies suggest that hsa_circ_0005836 may be related to colorectal cancer.

Discussion
Accumulating evidence suggests that circRNAs play crucial roles in human diseases.CircRNAdisease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers.Therefore, there is an urgent need to develop novel computational methods to accurately predict circRNA-disease associations.
In this paper, we proposed a novel deep learning-based method called iCircDA-NEAE to discover new potential circRNA-disease associations.Experimental results demonstrated that The performance of iCircDA-NEAE mainly depends on three factors: (i) iCircDA-NEAE incorporates multi-source biometric information to measure complex associations between circRNAs and diseases.(ii) iCircDA-NEAE uses disease semantic similarity, Gaussian interaction kernel (GIP), circRNA expression profile similarity, and Jaccard similarity to make the most of biometric information in the data.(iii) iCircDA-NEAE incorporates the advantages of ANNE and DCAE, which not only effectively integrates multi-source information, but also effectively captures hidden high-level information of data.
Two possible issues in this paper should be discussed: (i) since negative samples are difficult to obtain, we can only randomly select samples from unconfirmed samples as negative samples.The number of positive samples and negative samples is the same, thus avoiding the sample imbalance problem.But doing this will inevitably lead to negative samples containing very few true positive samples.(ii) since iCircDA-NEAE utilizes the strongly-supervised label information (true association labels) to predict circRNA-disease associations, so iCircDA-NEAE is overwhelmingly dependent on the quality of the ground truth association labels.Therefore, some more comprehensive methods should be proposed to solve the two issues in future works.
First, iCircDA-NEAE uses disease semantic similarity, Gaussian interaction kernel (GIP), circRNA expression profile similarity and Jaccard similarity to measure the biometric information in the data, and constructs multisource information fusion descriptor.Second, AANE extracts feature from the descriptor data.Third, DCAE extracts hidden features from data.Finally, the random forest classifier uses hidden features to predict circRNA-disease association.The flow chart of iCircDA-NEAE is shown in Fig 1 .The source code and data are available at: https://github.com/nathanyl/iCircDA-NEAE.

Similarity measures
Before introducing the method, we summarize the notation used in this paper as follows: italic indicates a scalar quantity, as in A or a; lower case boldface indicates a vector quantity, as in a; upper case boldface indicates a matrix quantity, as in A.
Similarity measurement can convert the relationship between biological factors into feature information that can be used by the model, so it is a crucial step in building a prediction model.We constructed similarity matrices from four aspects: disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity.

Construction of disease semantic similarity
Disease semantic similarity measures the relationship between diseases [38][39][40].The MeSH database uses a directed cycle graph (DAG) to represent diseases and disease associations.A node in the DAG represents a disease, and the edges of the DAG represent associations between diseases.In MeSH , DAG d (d, N d , E d ) is used to represent information about disease d, N d represents the set of disease nodes that are related to d and contain d itself, and E d represents the set of edges between these diseases.For disease e, if Nd contains e and e = d, the disease contribution value of e to d is defined as 1(D d (e) = 1).If e6 ¼d, the disease contribution value is calculated as follows: where μ is the semantic contribution factor between diseases, we set μ to 0.5 according to the study [41].
Then, the semantic value DV(d) of disease d is defined as follows: In DAG, the more nodes are shared between two diseases, the more similar the two diseases are.The semantic similarity DSS 1 (d(i), d(j)) between disease d(i) and d(j) is defined as follows: where DSS 1 is the disease semantic similarity matrix.While considering the disease semantic similarity DSS 1 , the impact of disease number on disease contribution should also be considered.Inspired by Wang's method [42], the contribution of disease e under the influence of the disease number can be defined as follows: where num(DAG d (e)) is the number of diseases associated with disease d and num(diseases) is the number of all diseases.Then, the disease semantic similarity DSS 2 (d(i), d(j)) of disease d(i) and d(j) can be defined as follows: DSS 2 ðdðiÞ; dðjÞÞ ¼

Construction of the Gaussian interaction profile kernel
To obtain comprehensive disease similarity information, we used Gaussian interaction profile (GIP) [43][44][45] kernel to calculate disease similarity.Assuming that circRNA c 1 is associated with disease d 1 , if disease d 2 is highly similar to disease d 1 , then disease d 2 -associated circRNAs tend to have similar functions to circRNA c 1 [46].Therefore, we used circRNA-disease association adjacency matrix to calculate the GIP kernel similarity between disease d i and d j , the formula is defined as follows: where GD is the GIP kernel similarity matrix between diseases.d(i) represents the row vector of the i-th disease and μ is the bandwidth parameter of the GIP, which can be calculated by the following formula: where n is the number of rows of the circRNA-disease association matrix.
Similarly, the GIP kernel similarity between circRNAs is defined as follows: where GC is the GIP kernel similarity matrix between circRNAs.c(i) represents the column vector of the i-th circRNA and μ is the bandwidth parameter of the GIP, which can be calculated by the following formula: where m is the number of columns of the circRNA-disease association matrix.

Construction of the CircRNA expression profile similarity
The circRNA expression profile (EP) similarity from exoRBase data-base is another important information for constructing circRNA-disease association prediction models.We used 32-dimensional feature vectors to represent circRNAs, and sorted the circRNAs in descending order according to the feature vectors [16,47,48].Spearman correlation coefficient [49] was used to calculate the EP similarity between circRNAs: where d p is the feature vector difference between circRNA i and circRNA j, l i represents the 32-dimensional vector of i-th circRNA after sorting, and k is the number of circRNAs.Let SE be an k×k circRNA adjacency matrix consisting of ρ(c i , c j ).

Construction of the Jaccard similarity
Jaccard similarity is used to represent the similarity between sets [50][51][52].J(A, B) is the ratio of the intersection of sets A and B to the union of A and B. The larger the Jaccard value, the higher the similarity between sets A and B. We used Jaccard to calculate the similarities between diseases and circRNAs.We calculated the Jaccard similarity of disease d(i) and disease d(j) with the following formula: JDðdðiÞ; dðjÞÞ ¼ j caðdðiÞÞ \ caðdðjÞÞ caðdðiÞÞ [ caðdðjÞÞ j ð11Þ where JD is the Jaccard similarity matrix between diseases.ca(d(i)) represents the circRNAs associated with disease d(i).
The Jaccard similarity calculation formula of circRNAs is defined as follows: JCðcðiÞ; cðjÞÞ ¼ j daðcðiÞÞ \ daðcðjÞÞ daðcðiÞÞ [ daðcðjÞÞ j ð12Þ where JC is the Jaccard similarity matrix between circRNAs.da(c(i)) represents the diseases associated with circRNA c(i).

Multisource feature fusion
The multisource feature fusion method can fuse a variety of biological feature information, eliminate redundant information, and improve the accuracy of feature extraction.Feature fusion was used to integrate multiple similarity information into a unified identifier, which contains a large number of circRNA and disease feature information, and contains multiple association information.The fusion of disease similarity multisource in-formation can be defined as follows: DSðdðiÞ The fusion of circRNA similarity multisource information can be defined as follows: CSðcðiÞ; cðjÞÞ ¼ rðcðiÞ; cðjÞÞ þ GCðcðiÞ; cðjÞÞ 2 ; if rðcðiÞ; cðjÞÞ exits GCðcðiÞ; cðjÞÞ; otherwise : Finally, we used principal component analysis (PCA) [53] to reduce the dimensionality of CM and DM, and obtain CM and DM.The fusion information of circRNA and disease is obtained according to the following formula: Among them, CM(c(i)) represents the i-th row vector of CM, and DM(d(j)) represents the j-th column vector of DM.
Let AM be an m×n adjacency matrix corresponding to the circRNA-disease association dataset from circR2Disease database, where m (m = 661) is the number of circRNAs and n (n = 100) is the number of diseases.If AM(i, j) = 1, it means that circRNA c(i) is associated with disease d(j), otherwise AM(i, j) = 0.

Feature extraction methods
AANE algorithm to extract features.Compared with widely used feature extraction methods PCA, LINE (Large-scale Information Network Embedding) [54], node2vec [55] and DeepWalk [56], AANE incorporates the correlation between node attrib-utes into the network embedding to better learn feature representations.AANE is used to extract low-dimensional features.The flowchart of AANE algorithm is shown in Fig 6.
For a network N = (V, E, W), V is the node set, W is the edge set, and the edge e ij in W represents the edge connecting node i and node j.The value of e ij is closely related to the similarity between nodes.The larger the value of e ij , the more similar node i is to node j.According to the theory that a real symmetric matrix can be diagonalized by an orthogonal matrix, the formula is defined as follows: where A is a semi-definite symmetric matrix, which can be represented by an orthogonal matrix H and a diagonal matrix Λ. B is a matrix consisting of the square root of the elements in the Λ.
When applying this algorithm, the similarity matrix S is calculated by applying the cosine similarity algorithm to the attribute matrix AM.Based on Eq 18, matrix S is decomposed into two matrices Q and Q T .
Node vectors have high similarity in two situations, one is that the nodes have high similarity in topological structure, and the other is that the weight value between nodes is large.The objective function is defined as follows: where λ is the balance parameter.Based on Z = Q, the objective function can be written as follws: where q represents the penalty parameter, and u i is the scaled data of the dual variable.The alternating direction method of the multiplier (ADMM) is used to solve the objective function: Dynamic convolutional autoencoder to extract features.Convolutional autoencoder (CAE) can efficiently extract hidden features from data [57,58].Inspired by the dynamic convolution [59,60], we proposed a dynamic convolutional autoencoder (DCAE) by replacing the convolution with dynamic convolution.DCAE extracts features more efficiently than CAE (see Table 1).The flowchart of DCAE algorithm is shown in Fig 7 .The details of DCAE are as follows.First, the input vector x passes through the dynamic convolution layer, the pooling layer and hidden layer to obtain an output vector y.This process is called encoding.The encoding formula is as follows: where P k denotes the attention weight of the K-th linear function, de-notes the convolution operation, W and b are the weight matrix and bias vector, g is the sigmoid activation function, Wis the aggregation weight, and bis the aggregation bias.
Then, the input y passes through the deconvolution layer and the out-put layer to obtain the reconstructed vector x'.This process is called decoding.The formula for decoding is as follows: During the training of each layer, we computed the loss function between the reconstruction vector x' and the input vector x, and optimized the value of the loss function to a threshold.An optimization process was performed at each layer.
The attention weights will vary according to x to obtain the optimal aggregation model.Therefore, the dynamic convolutional autoencoder can achieve better higher level representations than the ordinary autoencoder.The dynamic convolution consists of three parts, including attention weights, W and b in the optimal weights.In DCAE, the computational cost of the input feature H×W×C in is much smaller than that of ordinary convolution.The computational cost is as follows: where O(•) denotes computational cost, D k denotes kernel size, C out denotes the number of output channels.The computational cost of attention weights is much lower than directly calculating the optimal parameters.DCAE has better flexibility and lower computational cost than ordinary autoencoders.
In the experiment, we set the DCAE as a two-layer network with a learning rate of 0.001, using minimum mean squared error (MSE) as the loss function and gradient descent algorithm as the optimization method.

Random forest classifier predicts associations
In the experiment, a random forest classifier used the extracted features to complete a classification task to discover potential circRNA-disease associations.The execution steps of the random forest classifier can be summarized as follows: 1.The classifier selects N samples using Bootstrap method.The selected N samples are used to train a decision tree.
2. The classifier randomly selects m features from the M features of the sample (m << M), and selects one feature from the m features as the split feature of the node using the information gain ratio.In the process of forming a decision tree, each node is split until it can no longer be split.
3. According to steps 1~2, a large number of decision trees are constructed to form a random forest.
The random forest classifier predicts scores for circRNA-disease associations.An association is considered a potential association if the prediction score is greater than a set threshold.The grid search algorithm was used to determine parameters in the classifier, and the number of decision trees was set to 100.

Evaluation methods
The two commonly used methods (k-fold cross-validation and independent dataset testing) were used to evaluate the model performance.In the experiments, we recorded the true positive (TP), false negative (FN), true negative (TN) and false positive (FP) values.Five evaluation metrics were used to assess the model, namely area under curve (AUC), accuracy (ACC), sensitivity (Sen), F1-Score and Matthew correlation coefficient (MCC).These evaluation metrics are defined as follows: TP � TN À FP � FN ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi

Foundation(
Grant nos.2022JJD170019 & 2021JJA170204 & 2021JJA170199) and Guangxi Science and Technology Base and Talents Special Project (Grant nos.2021AC19354 & 2021AC19394), CHZ is supported by the National Natural Science Foundation of China (No. U19A2064), LY is supported by the National Natural Science Foundation of China (No. 62002189), the Natural Science Foundation of Shandong Province, China (No. ZR2020QF038) and Technology Small and Medium Enterprises Innovation Capability Improvement Project of Shandong Province (No. 2023TSGC0279), ZS is supported by the National Natural Science Foundation of China (No. 62102200), YSG is supported by the 20 Planned Projects in Jinan (No. 2021GXRC046) and the Excellent Teaching Team Training Plan Project of QILU UNIVERSITY OF TECHNOLOGY.The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
We compared the run time of iCircDA-NEAE with iCircDA-NEAE' (DCAE replaced by CAE) on the NVIDIA RTX 3080 GPU with 10GB of VRAM.Experimental results show that the computation time (63 min 27 s) of iCircDA-NEAE is less than that (80 min 23 s) of iCirc-DA-NEAE'.CAE model are computationally more expensive than DCAE model.The detailed results were recorded in S1 Table.

Fig 1 .
Fig 1.Schematic overview of iCircDA-NEAE framework.Experimental data comes from exoRBase dataset, circR2Disease dataset and MeSH dataset.Disease semantic similarity, Gaussian interaction kernel (GIP), circRNA expression profile similarity, and Jaccard similarity are used to measure the biometric information in the data, then multisource information fusion descriptor was constructed.AANE and DCAE are used to learn the features in the data.Random forest classifier are used to predict circRNA-disease association.https://doi.org/10.1371/journal.pcbi.1011344.g001 Fig 3A-3C, respectively.As shown in Fig 3, iCircDA-NEAE with random forest classifier outperforms other classifiers on all datasets.The ACC, Sen, F1, MCC and AUC values were presented in Table

Table 5 . The top 20 circRNA-disease associations.
NEAE outperforms other state-of-the-art prediction methods, and can accurately predict potential circRNA-disease associations.Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature.Furthermore, according to the relevant literature, we observed that novel circRNA-disease associations predicted by iCircDA-NEAE are potential associations. https://doi.org/10.1371/journal.pcbi.1011344.t005iCircDA-