Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Deep graph contrastive learning model for drug-drug interaction prediction


Drug-drug interaction (DDI) is the combined effects of multiple drugs taken together, which can either enhance or reduce each other’s efficacy. Thus, drug interaction analysis plays an important role in improving treatment effectiveness and patient safety. It has become a new challenge to use computational methods to accelerate drug interaction time and reduce its cost-effectiveness. The existing methods often do not fully explore the relationship between the structural information and the functional information of drug molecules, resulting in low prediction accuracy for drug interactions, poor generalization, and other issues. In this paper, we propose a novel method, which is a deep graph contrastive learning model for drug-drug interaction prediction (DeepGCL for brevity). DeepGCL incorporates a contrastive learning component to enhance the consistency of information between different views (molecular structure and interaction network), which means that the DeepGCL model predicts drug interactions by integrating molecular structure features and interaction network topology features. Experimental results show that DeepGCL achieves better performance than other methods in all datasets. Moreover, we conducted many experiments to analyze the necessity of each component of the model and the robustness of the model, which also showed promising results. The source code of DeepGCL is freely available at


Drug–drug interaction (DDI) refers to the phenomenon that occurs when two or more drugs are taken together, resulting in adverse effects on an organism [1, 2]. Thus, how to accurately identify drug-drug interactions has become an important research content. Traditional methods which used in drug-drug interaction identification are mainly based on experimental assays and clinical reports [3]. However, this process would be costly and time-consuming, especially for identifying drug-drug interactions from a large drug space. Computational methods (in silico [46]) can be used as an effective and fast alternative to alleviate this problem. Among these methods usually focus on learning single drug properties and lack effective integration of multiple sources of drug-related information, which ultimately limits the predictive capabilities of the model. Therefore, it has become an important research direction in the field of drug discovery to propose an effective and fast calculation method for drug-drug interaction prediction.

In recent years, accumulated research findings have demonstrated promising results in computational-based drug-drug interactions (DDIs) prediction. These achievements are primarily attributed to the rapid advancements in drug molecular property prediction [710]. These methods for predicting DDIs can be broadly categorized into two groups: Structure-based methods and network-based methods. Firstly, structure-based methods mainly consider the entire drug as a graph or sequence. For example, some researchers consider atoms as nodes and bonds between atoms as edges, then use a graph neural network (GNN) to learn the representation of each drug [1117]. Additionally, some models use SMILES (Simplified Molecular Input Line Entry System) [18] as the input for sequence models (including GRU [19], LSTM [20], and Transformer [21]), then predict the DDIs. In these methods, drugs are treated as independent individuals, and the representation is learned from the drug molecular structure and then transported to the classifier through some aggregating operations. Next, another important method for predicting DDIs is the network-based method. In this kind of method, the authors mainly consider the drug as a node, and then consider the interaction or similarity between drugs as an edge to form a large network, and then use the traditional network science method or the graph neural network method to predict the unknown interaction of drug molecules [2224]. Although these methods have achieved good performance, they still have some limitations. Firstly, the structure-based methods assume that drugs with similar features will behave similarly in the DDIs, however, there may be a lower similarity between interacting drugs. Meanwhile, the performance of the network-based methods relies on the quality of the interaction network, and it is time-consuming and difficult to build large-scale high-quality networks. Second, the drug molecular graph and the drug interaction network contain mutually irreplaceable pharmacological properties, which are very important for predicting DDIs. The drug molecular graph contains information about the drug functional groups that determine the chemical and physical properties of the drug. The topological information between drugs is contained in the interaction network, which contains some specific functions of some drugs. Although these methods obtain great performance in some specific tasks, they focus only on single-view learning and ignore the mutual complementarity of information among multi-view.

Existing research has demonstrated the effectiveness of building models to predict DDIs from multiple perspectives, primarily by aggregating multi-source information, including drug structure information, network topological information, and more [2527]. For example, MUFFIN [28] has aggregated molecular structure information and drug topology information to predict DDIs. DSN-DDI [29] has utilized both local and global representation learning modules, which can learn drug substructures from individual drugs (intra-view) and drug pairs (inter-view) simultaneously. m2vec [22] has combined drug target networks with SMILES information and then used graph autoencoders to learn the final representation of drugs. The success of these methods confirmed the advantages of predicting DDIs from the multi-view. However, these methods prioritize leveraging multi-view data to improve drug representation, without considering the balance and consistency of multi-source information, and cannot effectively utilize the structure-level and network-level information. Contrastive learning is often used to maximize the mutual information between multiple perspectives. Thus, researchers have used the contrastive learning component to balance and integrate molecular structure information and interaction network information [30]. Moving forward, if the drug pair can be directly regarded as a whole, the representation vector can be learned at the level of the drug pair, which can be used to model training and DDI prediction, it may provide a new perspective for DDI prediction.

In this paper, we introduce a novel Deep Graph Contrastive Learning model (DeepGCL) for drug-drug interaction prediction. DeepGCL leverages graph contrastive learning to combine both molecular structure features and network topological features. Firstly, DeepGCL constructs the molecular structure graph for each drug and employs a graph convolutional network (GCN) to learn the structural features of the drugs. Then, DeepGCL constructs a subgraph for each drug pair and utilizes GCN to learn the topological features of each drug pair. To better choose a pooling operation, it is important to emphasize that we utilize a virtual node to aggregate the node features of the entire subgraph. Next, the graph contrastive learning model is used to combine the features of drug structure and network topology. Finally, the structural and topological features of the learned drug pairs are integrated for drug-drug interaction prediction. Experimental results demonstrate that DeepGCL achieves the best performance across three real-world datasets. In our study, we performed an ablation analysis which unequivocally demonstrated the essential role of graph contrastive learning in integrating information from various perspectives. Meanwhile, we also conducted experiments to assess the robustness of the DeepGCL model.


DeepGCL framework

The overview of the DeepGCL is shown in Fig 1. DeepGCL is decomposed into three parts: (1) Topology information learning module (Fig 1B). This module mainly uses a graph convolutional network to learn the representation of each node in a local subgraph. (2) Structural information learning module (Fig 1C). This module utilizes a graph convolutional network to learn the representation of each drug, and drugs of the drug pair share parameters during the learning process. (3) Graph contrastive learning module and prediction module (Fig 1D). This module mainly uses graph contrastive learning and cross-entropy loss to constrain the model iteration and predict the probability of interaction between input drug pairs. Firstly, all drugs form an interconnected network in which the drugs represent nodes, and the edges represent interactions between the drugs (Fig 1A). Then, we sample the common H-Hop neighbor nodes from the drug interaction network for any input drug pair to construct a subgraph. Meanwhile, we introduce a virtual node to learn the global features of the subgraph, which is connected to all nodes within the subgraph. Additionally, we utilize the internal structural information of drug molecules to construct molecular graphs. Furthermore, to balance the information between the molecular and subgraphs of the drug pair, we incorporate a graph contrastive learning module that optimizes the model by ensuring consistency in the representation between the molecular and subgraphs. Finally, the prediction of drug-drug interactions combines the molecular structure and topology information of drug pairs.

Fig 1. Overview of DeepGCL.

(A) Drug-drug interaction networks: This component illustrates the network of interactions between drugs. (B) Topology information learning module: This module extracts common H-Hop neighbor nodes for drug pairs to form a subgraph. Subsequently, the subgraph is passed through GCN to generate a global representation for the drug pair. (C) Structural information learning module: This module employs GCN with shared parameters to acquire representations for drugs within drug pairs. (D) Graph contrastive learning module and prediction module: In this module, graph contrastive learning and cross-entropy loss are employed to regulate the model’s iteration and predict the interaction probability between input drug pairs.

Subgraph construction and representation learning

In recent years, the use of graph neural networks for analyzing networked graph data has garnered considerable attention. For example, Scorpius constructs a knowledge graph to evaluate the correlation between drugs and diseases [31]. However, dealing with large-scale networks can pose significant computational challenges. Therefore, there’s increasing interest in extracting topological information from subgraphs. For instance, DisenCite utilizes L hops neighbors from the paper relationships network to learn topological information [32]. Inspired by this, DeepGCL learns topological relationships among drugs within a specific neighborhood by selecting H-Hop neighboring nodes and edges to construct subgraphs. This approach enables DeepGCL to concentrate on learning local drug pair features from subgraphs without the need to train the entire drug interaction network. If two drugs lack shared neighbors, the subgraph consists of only those two drugs, otherwise, shared neighbors are used to form subgraphs.

Given a drug interaction network graph GI = (V, E), V denotes the set of nodes in the graph, where viV denotes the i-th drug in the drug interaction network. E denotes the set of edges in the graph, where each edge represents an interaction between drugs. Nv(h) denotes the set of H-Hop neighbors of node v. For any sample (vi, vj), we sample the first-order neighbor for the node and obtain the second-order neighbors from the first-order neighbors, and so on. Then we can obtain and , we choose the common neighbors of two drugs to form the drug interaction subgraph denoted as as the nodes that form the subgraph Gsub.

We obtain the subgraph Gsub = (A, X) using the aforementioned sampling method. Here, is the adjacency matrix, is the node feature matrix, and d is the dimension of node features. It is essential to highlight that we initialize the node features using 166-dimensional Morgan fingerprints [33]. For learning drug interaction network subgraph features, we employ a variant of Graph Convolutional Networks [34]. Unlike traditional graph convolutional networks that aggregate information from neighboring nodes, our approach focuses on extracting global information from the entire graph using virtual nodes. These virtual nodes are connected to all other nodes in the graph, facilitating direct communication between higher-order neighboring nodes to capture global graph features. The virtual node feature vector is initialized as the average of the other node feature vectors. After adding the virtual node, the subgraph based on node feature matrix , where is the adjacency of subgraph and n + 1 is the number of nodes in the subgraph. Then, the node features are updated by a multilayer graph convolutional neural network, defined as follows: (1) where is the l-th layer weight matrix, and ReLU is the activation function. and is the diagonal matrix of . Finally, we extract the features of virtual nodes as the global features of subgraphs, as follows: (2) where denotes the feature representation of the virtual nodes in the l level. Next, binary classification using MLP, as follows: (3) Then, calculating cross-entropy loss lsub using the predicted probability distribution psub from embedding ZT, as follows: (4) where yi is the truth label (0 or 1) and is the predicted probability of sample i.

Molecular structure representation

This section employs GCN with shared parameters to learn drug molecular structural representations. Specifically, we use the open-source software RDKit [35] to transform the drug SMILES into molecular graphs. In these molecular graphs, the nodes and edges correspond to atoms and bonds within the molecule, respectively. Given a molecular graph defined as Gm = (A, X), where denotes the symmetric adjacency matrix with m nodes, is the node feature matrix, h is the dimension of the matrix. Each node represents an atom, and Ai,j = 1 indicates the existence of a covalent bond between nodes i, j, and 0 otherwise. For molecular structure feature initialization, we follow the approach used in DeepDDS [12], which utilizes DeepChem [36] to calculate five pieces of information: atomic symbols and the number of adjacent atoms.

In each layer of the GCN, every node aggregates information from itself and its surrounding neighboring nodes to acquire higher-level feature representations. For the input drug molecular graph (Af, X), the output can be defined as below: (5) where represents the shared weight parameter, h′ is the dimensionality of output features and is not affected by the number of nodes. and is the diagonal matrix of and is the initial feature matrix, and denotes the feature representation of the nodes in the k+1 layer. In order to learn similar information between drug pairs, for the molecular graph (At, X), the weight parameter Wc is shared at each layer in both GCNs, as follows: (6) where is the initial feature matrix, and denotes the feature representation of the nodes in the k+1 level. Then the graph-level representation of each molecular graph extracted is concatenated to obtain the embedding ZC of the drug pair, as follows: (7) where ∥ is the concatenate operation. In DeepGCL, the READOUT function performs max pooling. Next, MLP is used to classify tasks as follows: (8) The cross-entropy loss lconst, as follows: (9)

Graph contrastive learning

Due to the scarcity of labeled data, unsupervised learning has been widely applied in the fields of few-shot learning [37], recommendation systems [38, 39], and natural language processing [40]. Among them, KGNN combines graph neural networks and kernel-based networks to effectively utilize both labeled and unlabeled graphs [41]. Additionally, graph contrastive learning represents one of the most advanced unsupervised learning methods, with successful applications in various tasks, including node classification, graph classification, and drug discovery [4244]. Some graph contrastive learning models commonly use a contrastive learning framework based on graph augmentation, such as edge perturbation [45], node deletion [46], and attribute augmentation [47] to form a contrastive view. However, methods that perturb the structure of the input graph to obtain different contrasting views may introduce noise, potentially affecting the model’s performance. Therefore, DSGC [48] constructs contrasting views in different spaces and fits the advantage of each by graph contrastive learning. Inspired by this, both drug molecule graphs and subgraphs contain rich information about drug interactions, and we can leverage contrastive learning to combine the two to obtain better embeddings.

For drug molecule graphs and subgraphs, after the nonlinear mapping introduced above, we obtain their low-dimensional representations. The representation of the molecular graph contains information on multiple functional groups, while the subgraph contains local topological information within the interaction network. The consistency of the representation vectors can be maximized by adding a contrastive learning component.

Specifically, we feed a batch of samples of size N into the model, and a total of 2N embeddings are obtained through the model. Negative pairs are formed between samples from the same batch, and positive pairs are formed between two views produced by the same sample. Then the loss can be expressed as the distance between a positive sample and the remaining K negative samples. Thus, for sample i, we have the following self-supervised loss. (10) where τ is the temperature coefficient, and are the molecular graph embedding and subgraph embedding of sample i, respectively. sim(ZT, ZC) denotes the cosine similarity between two vectors. The total contrastive loss of all samples is defined as follows: (11)

Drug-drug interaction prediction

For each sample, the framework generates a molecular graph representation and a subgraph representation of the drug pair through its two channels. Subsequently, the relationship between the two embeddings is modeled through various operations to obtain the final representation of the drug pair. Finally, we utilize this representation as input to MLP and evaluate the predicted interaction score, as follows: (12) where ⊙ is the inner product between vectors. The binary vector pct represents the associated probabilities of drug pairs. Label is 1 for interacting drug pairs and 0 otherwise. Subsequently, we calculate the cross-entropy loss, denoted as ljoint, to measure the dissimilarity between predicted probabilities and the true labels, as follows: (13) From this, DeepGCL’s overall loss function can be described as follows: (14) where lconst measures the discrepancy between the model’s predictions of molecular structural information and the ground truth, while lsub quantifies the difference between the model’s predictions of the topological information of drug-pair interaction subgraphs and the ground truth. ljoint represents the disparity between model predictions and the true situation after aggregating molecular structure information and subgraph topology information. lcontr represents the graph contrastive learning loss for unsupervised tasks. The α, β, λ, and η are the coefficients of different losses.


In this section, we demonstrate the performance of the model on two real-world datasets to test the effectiveness of the model in the task of predicting adverse drug reaction classification to answer the following three questions:

  • Q1: How does DeepGCL perform in real-world datasets compared to other models?
  • Q2: Does integrating information from drug molecular graphs and subgraphs improve the performance of the model?
  • Q3: After adding the contrastive learning component, can the model further improve the learning ability of the model?

Dataset and baseline

DeepGCL is a binary classification model that focuses on detecting drug interactions, which utilizes three real-world datasets (BioSNAP [49], AdverseDDI [50], and DrugBank [51]) to verify performance. After preprocessing drug SMILES strings with RDKit, we excluded drugs lacking SMILES representations and corresponding molecular data. In this model, the positive-to-negative sample ratio is 1:1. Details are shown in Table 1.

To verify the validity of DeepGCL and answer Q1, we compare it with two types of models, which are graph neural network models and network embedding models. The graph neural network models include CSGNN [52], DeepDDI [53], DeepDDS [12] and CASTER [8]. Among these, DeepDDI and DeepDDS use drug structure information to learn drug representation. CSGNN proposes that a hybrid multi-Hop neighborhood aggregator will be incorporated to capture the interrelationships of indirect neighbors in molecular interaction networks. CASTER considers the functional substructure of the drug, uses a self-encoder to learn the chemical structure data, and increases the interpretability of the model by adding a dictionary learning module. The network embedding model includes Deepwalk [54], Line [55], node2vec [56], SDNE [57], and struc2vec [58]. Deepwalk preserves the similarity between neighboring nodes and Line further preserves the similarity of nodes that have common neighbors. node2vec improves the random wandering strategy and enriches the contextual information of the nodes. SDNE is a semi-supervised learning method that uses self-encoders to simultaneously optimize the similarity of the nodes’ higher-order neighbors and learn the local and global features of the nodes. struc2vec focuses on the spatial structure features of nodes in the network, considering the similarity of nodes in the local topology. NNPS [59] constructs initial features by amalgamating information concerning drug molecular side effects and drug-protein interactions. Subsequently, NNPS employs neural networks to compute the probabilities associated with adverse drug reactions for given drug combinations.

Experimental setting

To evaluate our model more comprehensively, we randomly split the dataset into a training set, a validation set, and a test set using an 8:1:1 ratio. For each experiment, we randomly split the dataset 5 times. All comparison models were set according to the parameters in the original paper. Referring to the DeepDDS [12], we train two shared parameter 3-layer GCNs for learning drug molecular graphs with dimensions {78, 156, 128}, and train drug interaction network subgraphs with encoder 3-layer GCNs with dimensions {166, 332, 128}. The optimizer is Adam and the dropout rate is {0.2, 0.5, 0.8}. For joint training, we set α = 0.1, β = 1, λ = 1, η = 1. We use the area under the ROC curve (AUC), F1 score (F1), and area under the precision recall curve (AUPR) as metrics to evaluate the model. The training process encompassed a total of 100 epochs.

Experimental results

In Table 2, we provide the mean and standard deviation of performance metrics for DeepGCL and various baseline models on three real-world datasets. Superior results are highlighted in bold text. DeepGCL consistently exhibits strong performance across all datasets. This observation not only supports the effectiveness of our approach but also provides validation for the research question Q1.

Table 2. Comparative evaluation (mean ± std).

Best performance in each metric is shown in bold font.

As demonstrated in Table 2, network embedding models such as Line and SDNE exhibit performance levels comparable to that of CASTER. This observation underscores the significance of drug topological information within the drug interaction network, placing it on equal importance with the molecular structure information predicted by DDI. While DeepDDS outperforms DeepDDI in the BioSNAP and AdverseDDI datasets, DeepDDI achieves superior performance in the DrugBank dataset. Both models rely solely on molecular structure information. DeepDDS is based on neural network architecture and highlights the potential of neural network models in DDI prediction. It’s noteworthy to emphasize that DeepDDI’s exceptional performance can be attributed to its integration of additional drug databases for drug structure similarity calculations. This incorporation of additional data sources broadens DeepDDI’s scope, allowing it to incorporate a more comprehensive range of drug-related information compared to other models. Among the baselines, CSGNN consistently shows stable performance across all three datasets. This observation suggests that the model’s approach of enhancing communication among higher-order neighbor nodes contributes to its predictive abilities. Comparatively, DeepGCL uses virtual nodes to connect all nodes in the subgraph, and higher-order neighbors communicate through virtual nodes as intermediaries during message passing. DeepGCL outperforms all other compared models. DeepGCL aggregates drug molecular structure and drug topology information to make up for the limitations of single-molecule graph learning.

We have incorporated an array of evaluation metrics, including Mean Average Precision (MAP), Mean Reciprocal Ranking (MRR), and HIT@K metrics. DeepGCL consistently demonstrates competitive performance across these diverse evaluation criteria, as evident in the experimental results presented in S1 Table. These metrics are particularly relevant in the context of drug discovery, where the emphasis is often on identifying the most promising drug candidates for further experimentation. DeepGCL showed reliable performance, indicating its ability to identify potentially interacting drug pairs. In practical drug recommendation scenarios, the top-ranked drug pairs are of paramount importance, and our model’s proficiency in this regard further underscores its utility in drug discovery.

Furthermore, we analyzed the training time of the model on various datasets to evaluate its computational efficiency. As shown in S2 Table, DeepGCL exhibits advantages in computational efficiency compared to several models, notably NNPS and DeepDDI. This superiority can be attributed to the effectiveness of graph neural networks in learning features from graph-structured data. DeepGCL focuses on molecular structure and network topology to enhance drug interaction prediction accuracy, which consequently affects computational efficiency. However, it remains competitive in both model prediction accuracy and computational efficiency.

Ablation study

To further investigate the necessity and effectiveness of each component of the DeepGCL model and address questions Q2 and Q3, we designed the following variants of DeepGCL for experiments on three datasets. Each variant was trained five times independently, and the mean and standard deviation were calculated five times.

DeepGCL without molecular structure learning (DeepGCL w/o molecular) learns drug interaction information only from drug interaction subgraphs to make predictions about drug pairs.

DeepGCL without subgraph learning (DeepGCL w/o subgraph) learns only the embedding representation of the drug from the molecular graph as the representation vector of the drug pair.

DeepGCL without contrastive learning (DeepGCL w/o contrastive) trains the target based on supervised signals, and the acquired drug pair embeddings are used for downstream binary classification.

Fig 2 shows the results of DeepGCL and its variants for AUC, AUPR, and F1 scores. It’s evident that removing any component from DeepGCL results in weaker performance compared to the model before removal. These results demonstrate the necessity of the existence of each component in the DeepGCL model. In the BioSNAP and DrugBank datasets, the model’s performance is not significantly improved after adding contrastive learning, as observed in the AdverseDDI dataset. This disparity can be attributed to the fact that BioSNAP and DrugBank already contain rich drug interaction information. Consequently, even without the use of contrastive learning, DeepGCL achieves excellent performance by effectively integrating drug structure and interaction information. However, it’s worth noting that the addition of contrastive learning still resulted in performance improvements, albeit to a lesser extent. This indicates that it can effectively improve the predictive power of the model by integrating information from multiple drug perspectives. By incorporating the contrastive learning component, the model’s two encoders can glean more profound insights into the interplay between drug molecules. Consequently, the model’s learning capacity can be further enhanced. In conclusion, each component of the DeepGCL model is necessary and effective.

Fig 2. Comparative performance of DeepGCL and its variants on multiple evaluation metrics.

Robustness analysis

Existing deep learning models are susceptible to interference from noise. Here, to verify the robustness of the model, we randomly remove 10%, 20%, 30%, 40%, 50% of known associations. The results are shown in Fig 3. As the corresponding proportion of known associations was removed, all models showed a downward trend, but DeepGCL still performed best in all scenarios. Among graph neural network models, DeepDDS, and DeepDDI perform poorly in the process of removing edges. This can be due to data sparsity problems. These methods begin with drug similarity, learn drug embedding by drug molecular structure information, and assume that similar drugs will have similar performance in DDIs. In comparison, CSGNN shows reliable robustness after removing some edges. The possible reason for this result is that CSGNN uses deep mix-Hop graph neural networks to capture higher-order neighbors to alleviate the problem of data sparsity. In the AdverseDDI dataset, Deepwalk and node2vec degrade performance rapidly as the edges are removed. The above model assumes that nodes with common neighbors will be more similar, and this assumption is easily influenced by noise. In conclusion, DeepGCL can fit the respective advantages of molecular graphs and interaction networks to improve the robustness of the model.

Fig 3. Metrics for models in edge attack scenarios.

(a) Performance in BioSNAP dataset, (b) Performance in AdverseDDI dataset, (c) Performance in DrugBank dataset.

Cold start experiments

In Drug-Drug Interaction (DDI) prediction tasks, traditional K-fold cross-validation (CV) can inadvertently introduce information overlap between training and testing sets, potentially inflating results. To address this challenge, we employ two distinct cold start scenarios [60]: drug-wise CV and pairwise CV. In drug-wise CV, our objective is to predict interactions between known drugs and unknown drugs. In pairwise CV, the goal is to predict interactions exclusively between unknown drugs. We categorize input drugs into two groups: drugs for training (drugstrain) and cold-start drugs (drugscold) lacking known interactions in the training set. This categorization results in three distinct DDI subsets: DDItrain (for known drugs), DDIdrugwise (for interactions between cold-start and known drugs), and DDIpairwise (for interactions between cold-start drugs). Subsequently, we utilize logistic regression classifiers trained on DDItrain to make predictions in both drug-wise and pairwise scenarios. The results are presented in Table 3. Remarkably, All the employed models show a noticeable decline in performance across both scenarios when compared to traditional CV. Attributable to DeepGCL’s ability to learn drug representations from both molecular graphs and interaction networks, it exhibits robust performance across two distinct scenarios.

Table 3. Assessment of DeepGCL and competitive methods under the drug-wise and pairwise settings.

The best score is in bold.


In this section, we analyze the drug pair features learned by DeepGCL. To intuitively observe the relationships between features, we employ dimensionality reduction methods to visualize them as points in a two-dimensional space. Dimensionality reduction methods mainly include linear and nonlinear approaches [61]. Linear LRPER [62] and the nonlinear method T-SNE [63] are two popular dimensionality reduction methods. Given T-SNE’s advantage in preserving local structure, we opt for T-SNE as our dimensionality reduction tool. Since DeepGCL is a deep learning model, we only compare DeepGCL with other deep learning models, including DeepDDS, CSGNN, DeepDDI, and CASTER. As shown in Fig 4, the DeepGCL can well distinguish between drug pairs with (green) and without (red) interactions. In the BioSNAP dataset, we chose the contour coefficients as an indicator to measure the quality of the drug pair representations, and the contour coefficients of DeepGCL, DeepDDS, CSGNN, DeepDDI, CASTER were 0.3246, 0.2085, 0.1973, 0.0188, 0.1944. This indicates that DGCL can better extract the molecular representation of drug pairs.

Fig 4. Visualization of DDI network.

Red points indicate drug pairs without interactions and green points indicate drug pairs with interactions.

Parameter analysis

DeepGCL uses GCN to learn semantic information about the graph, where the network depth is crucial for the final learning quality. To verify the effect of different depths of the network on the performance of the model, we build networks of different depths for experiments. The experimental results are shown in Fig 5, where the horizontal axis coordinates correspond to the depth of the learned drug molecular graph network and the vertical axis corresponds to the depth of the learned drug interaction graph network, the darker the color of the module corresponds to the better model performance.

Fig 5. Effect of convolutional neural networks of different depths.

(a) Performance in BioSNAP dataset, (b) Performance in AdverseDDI dataset, (c) Performance in DrugBank dataset.

In the BioSNAP dataset, the combination of l = 2 and k = 3 respectively works best, and the combination of k = 1, and l = 3 in the AdverseDDI dataset works best. In the DrugBank dataset, the combination of l = 2 and k = 2 performs best. By evaluating the performance of various combinations, we ultimately chose l = 2 and k = 3.

DeepGCL constructs subgraphs from shared H-Hop drug neighbors to learn drug topology within the DDI network. Increasing H includes more nodes in the subgraph, enhancing topological understanding. In DeepGCL, we select H from the set 1,2 since computational limits. S1 Fig shows that including 2-Hop neighbors enhances performance on AdverseDDI and BioSNAP. In contrast, DrugBank performs optimally with only 1-Hop neighbors due to its higher node degree, where a larger subgraph might introduce noise, potentially counteracting the benefits. In the concluding model, the parameter H is set to 2 for the BioSNAP and AdverseDDI datasets, whereas it is set to 1 for the DrugBank dataset.


We present DeepGCL, a novel deep graph contrastive learning framework that integrates drug interaction network topology and molecular structure information. DeepGCL constructs subgraphs from shared H-Hop neighboring nodes in the Drug-Drug Interaction (DDI) network and employs GCN to obtain representations for drug molecular graphs and subgraphs. DeepGCL introduces a key graph contrastive learning component to enhance the consistency of embeddings across various perspectives. DeepGCL consistently demonstrates competitive performance across various metrics. When the model is applied to larger datasets, it learns additional topological information from subgraphs, leading to performance improvements. However, this comes with increased computational complexity. In the future, we’ll focus on efficient methods [64] for learning subgraph features, balancing computational efficiency and model performance on large-scale data to improve scalability. In summary, DeepGCL advances drug-drug interaction prediction and maintains a competitive edge in drug interaction research while providing valuable insights.

Supporting information

S1 Fig. The experiment assesses the impact of varying H-Hop neighbors on DeepGCL’s performance across multiple datasets.


S1 Table. Performance comparison of DeepGCL and competitive methods based on evaluation metrics MRR, MAP, and HIT@K.

The best score is in bold.


S2 Table. Training time of models on different datasets (Seconds).



  1. 1. Azam F, Vazquez A. Trends in phase ii trials for cancer therapies. Cancers. 2021;13(2):178. pmid:33430223
  2. 2. Wang M, Zeraatkar D, Obeda M, Lee M, Garcia C, Nguyen L, et al. Drug–drug interactions with warfarin: A systematic review and meta-analysis. British journal of clinical pharmacology. 2021;87(11):4051–4100. pmid:33769581
  3. 3. Li P, Huang C, Fu Y, Wang J, Wu Z, Ru J, et al. Large-scale exploration and analysis of drug combinations. Bioinformatics. 2015;31(12):2007–2016. pmid:25667546
  4. 4. Tang J, Karhinen L, Xu T, Szwajda A, Yadav B, Wennerberg K, et al. Target inhibition networks: predicting selective combinations of druggable targets to block cancer survival pathways. PLoS computational biology. 2013;9(9):e1003226. pmid:24068907
  5. 5. Sun Y, Xiong Y, Xu Q, Wei D, et al. A hadoop-based method to predict potential effective drug combination. BioMed research international. 2014;2014. pmid:25147789
  6. 6. Shen C, Ding P, Wee J, Bi J, Luo J, Xia K. Curvature-enhanced graph convolutional network for biomolecular interaction prediction. Computational and Structural Biotechnology Journal. 2024;.
  7. 7. Wang H, Lian D, Zhang Y, Qin L, Lin X. GoGNN: Graph of Graphs Neural Network for Predicting Structured Entity Interactions. arXiv e-prints. 2020; p. arXiv–2005.
  8. 8. Huang K, Xiao C, Hoang T, Glass L, Sun J. Caster: Predicting drug interactions with chemical substructure representation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34; 2020. p. 702–709.
  9. 9. Nyamabo AK, Yu H, Liu Z, Shi JY. Drug–drug interaction prediction with learnable size-adaptive molecular substructures. Briefings in Bioinformatics. 2022;23(1):bbab441. pmid:34695842
  10. 10. Chen X, Luo L, Shen C, Ding P, Luo J. An in silico method for predicting drug synergy based on multitask learning. Interdisciplinary Sciences: Computational Life Sciences. 2021;13:299–311. pmid:33611781
  11. 11. Wang J, Liu X, Shen S, Deng L, Liu H. DeepDDS: deep graph neural network with attention mechanism to predict synergistic drug combinations. Briefings in Bioinformatics. 2022;23(1):bbab390. pmid:34571537
  12. 12. Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, et al. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Briefings in Bioinformatics. 2021;22(6):bbab109. pmid:33940598
  13. 13. Fang X, Liu L, Lei J, He D, Zhang S, Zhou J, et al. ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction. arXiv e-prints. 2021; p. arXiv–2106.
  14. 14. Chen X, Liu X, Wu J. Drug-drug interaction prediction with graph representation learning. In: 2019 IEEE International conference on bioinformatics and biomedicine (BIBM). IEEE; 2019. p. 354–361.
  15. 15. Nyamabo AK, Yu H, Shi JY. SSI–DDI: substructure–substructure interactions for drug–drug interaction prediction. Briefings in Bioinformatics. 2021;22(6):bbab133. pmid:33951725
  16. 16. Yu H, Zhao S, Shi J. STNN-DDI: a substructure-aware tensor neural network to predict drug–drug interactions. Briefings in Bioinformatics. 2022;23(4):bbac209. pmid:35667078
  17. 17. Yang Z, Zhong W, Lv Q, Chen CYC. Learning size-adaptive molecular substructures for explainable drug–drug interaction prediction by substructure-aware graph neural network. Chemical science. 2022;13(29):8693–8703. pmid:35974769
  18. 18. Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences. 1988;28(1):31–36.
  19. 19. He C, Liu Y, Li H, Zhang H, Mao Y, Qin X, et al. Multi-type feature fusion based on graph neural network for drug-drug interaction prediction. BMC bioinformatics. 2022;23(1):224. pmid:35689200
  20. 20. Hou Y, Wang S, Bai B, Chan HS, Yuan S. Accurate physical property predictions via deep learning. Molecules. 2022;27(5):1668. pmid:35268770
  21. 21. Fabian B, Edlich T, Gaspar H, Segler M, Meyers J, Fiscato M, et al. Molecular representation learning with language models and domain-relevant auxiliary tasks. arXiv preprint arXiv:201113230. 2020;.
  22. 22. Purkayastha S, Mondal I, Sarkar S, Goyal P, Pillai JK. Drug-drug interactions prediction based on drug embedding and graph auto-encoder. In: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE). IEEE; 2019. p. 547–552.
  23. 23. Ma T, Shang J, Xiao C, Sun J. Genn: predicting correlated drug-drug interactions with graph energy neural networks. arXiv preprint arXiv:191002107. 2019;.
  24. 24. Yu Y, Huang K, Zhang C, Glass LM, Sun J, Xiao C. SumGNN: multi-typed drug interaction prediction via efficient knowledge graph summarization. Bioinformatics. 2021;37(18):2988–2995. pmid:33769494
  25. 25. Han K, Cao P, Wang Y, Xie F, Ma J, Yu M, et al. A review of approaches for predicting drug–drug interactions based on machine learning. Frontiers in pharmacology. 2022;12:814858. pmid:35153767
  26. 26. Lin S, Wang Y, Zhang L, Chu Y, Liu Y, Fang Y, et al. MDF-SA-DDI: predicting drug–drug interaction events based on multi-source drug fusion, multi-source feature fusion and transformer self-attention mechanism. Briefings in Bioinformatics. 2022;23(1):bbab421. pmid:34671814
  27. 27. Shan W, Shen C, Luo L, Ding P. Multi-task learning for predicting synergistic drug combinations based on auto-encoding multi-relational graphs. Iscience. 2023;26(10). pmid:37854693
  28. 28. Chen Y, Ma T, Yang X, Wang J, Song B, Zeng X. MUFFIN: multi-scale feature fusion for drug–drug interaction prediction. Bioinformatics. 2021;37(17):2651–2658. pmid:33720331
  29. 29. Li Z, Zhu S, Shao B, Zeng X, Wang T, Liu TY. DSN-DDI: an accurate and generalized framework for drug–drug interaction prediction by dual-view representation learning. Briefings in Bioinformatics. 2023;24(1):bbac597. pmid:36592061
  30. 30. Wang Y, Min Y, Chen X, Wu J. Multi-view graph contrastive representation learning for drug-drug interaction prediction. In: Proceedings of the Web Conference 2021; 2021. p. 2921–2933.
  31. 31. Yang J, Xu H, Mirzoyan S, Chen T, Liu Z, Ju W, et al. Poisoning scientific knowledge using large language models. bioRxiv. 2023; p. 2023–11.
  32. 32. Wang Y, Song Y, Li S, Cheng C, Ju W, Zhang M, et al. Disencite: Graph-based disentangled representation learning for context-specific citation generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36; 2022. p. 11449–11458.
  33. 33. Rogers D, Hahn M. Extended-connectivity fingerprints. Journal of chemical information and modeling. 2010;50(5):742–754. pmid:20426451
  34. 34. Ishiguro K, Maeda Si, Koyama M. Graph warp module: an auxiliary module for boosting the power of graph neural networks in molecular graph analysis. arXiv preprint arXiv:190201020. 2019;.
  35. 35. Landrum G. Rdkit documentation. Release. 2013;1(1-79):4.
  36. 36. Ramsundar B, Eastman P, Walters P, Pande V, Leswing K, Wu Z. Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy. Drug Discovery, and More. 2019;1.
  37. 37. Song Y, Ju W, Tian Z, Liu L, Zhang M, Xie Z. Building Conversational Diagnosis Systems for Fine-Grained Diseases Using Few Annotated Data. In: International Conference on Neural Information Processing. Springer; 2022. p. 591–603.
  38. 38. Qin Y, Wang Y, Sun F, Ju W, Hou X, Wang Z, et al. DisenPOI: Disentangling sequential and geographical influence for point-of-interest recommendation. In: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining; 2023. p. 508–516.
  39. 39. Ju W, Yi S, Wang Y, Long Q, Luo J, Xiao Z, et al. A survey of data-efficient graph learning. arXiv preprint arXiv:240200447. 2024;.
  40. 40. Liu C, Shen J, Xin H, Liu Z, Yuan Y, Wang H, et al. Fimo: A challenge formal dataset for automated theorem proving. arXiv preprint arXiv:230904295. 2023;.
  41. 41. Ju W, Yang J, Qu M, Song W, Shen J, Zhang M. Kgnn: Harnessing kernel-based networks for semi-supervised graph classification. In: Proceedings of the fifteenth ACM international conference on web search and data mining; 2022. p. 421–429.
  42. 42. Xu D, Cheng W, Luo D, Chen H, Zhang X. Infogcl: Information-aware graph contrastive learning. Advances in Neural Information Processing Systems. 2021;34:30414–30425.
  43. 43. Shuai J, Zhang K, Wu L, Sun P, Hong R, Wang M, et al. A review-aware graph contrastive learning framework for recommendation. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval; 2022. p. 1283–1293.
  44. 44. Yang Y, Huang C, Xia L, Li C. Knowledge graph contrastive learning for recommendation. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval; 2022. p. 1434–1443.
  45. 45. You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y. Graph contrastive learning with augmentations. Advances in neural information processing systems. 2020;33:5812–5823.
  46. 46. Wang Y, Wang J, Cao Z, Barati Farimani A. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence. 2022;4(3):279–287.
  47. 47. Fang Y, Zhang Q, Yang H, Zhuang X, Deng S, Zhang W, et al. Molecular contrastive learning with chemical element knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36; 2022. p. 3968–3976.
  48. 48. Yang H, Chen H, Pan S, Li L, Yu PS, Xu G. Dual space graph contrastive learning. In: Proceedings of the ACM Web Conference 2022; 2022. p. 1238–1247.
  49. 49. Marinka Zitnik SM Rok Sosič, Leskovec J. BioSNAP Datasets: Stanford Biomedical Network Dataset Collection; 2018.
  50. 50. Tatonetti NP, Ye PP, Daneshjou R, Altman RB. Data-driven prediction of drug effects and interactions. Science translational medicine. 2012;4(125):125ra31–125ra31. pmid:22422992
  51. 51. Al-Rabeah MH, Lakizadeh A. Prediction of drug-drug interaction events using graph neural networks based feature extraction. Scientific Reports. 2022;12(1):15590. pmid:36114278
  52. 52. Zhao C, Liu S, Huang F, Liu S, Zhang W. CSGNN: Contrastive Self-Supervised Graph Neural Network for Molecular Interaction Prediction. In: IJCAI; 2021. p. 3756–3763.
  53. 53. Ryu JY, Kim HU, Lee SY. Deep learning improves prediction of drug–drug and drug–food interactions. Proceedings of the national academy of sciences. 2018;115(18):E4304–E4311. pmid:29666228
  54. 54. Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining; 2014. p. 701–710.
  55. 55. Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q. Line: Large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web; 2015. p. 1067–1077.
  56. 56. Grover A, Leskovec J. node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining; 2016. p. 855–864.
  57. 57. Wang D, Cui P, Zhu W. Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining; 2016. p. 1225–1234.
  58. 58. Ribeiro LF, Saverese PH, Figueiredo DR. struc2vec: Learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining; 2017. p. 385–394.
  59. 59. Masumshah R, Aghdam R, Eslahchi C. A neural network-based method for polypharmacy side effects prediction. BMC bioinformatics. 2021;22(1):1–17. pmid:34303360
  60. 60. Celebi R, Uyar H, Yasar E, Gumus O, Dikenelli O, Dumontier M. Evaluation of knowledge graph embedding approaches for drug-drug interaction prediction in realistic settings. BMC bioinformatics. 2019;20(1):1–14. pmid:31852427
  61. 61. Deng YJ, Yang ML, Li HC, Long CF, Fang K, Du Q. Feature Dimensionality Reduction with L 2, p-Norm-Based Robust Embedding Regression for Classification of Hyperspectral Images. IEEE Transactions on Geoscience and Remote Sensing. 2024;.
  62. 62. Zhang T, Long CF, Deng YJ, Wang WY, Tan SQ, Li HC. Low-rank preserving embedding regression for robust image feature extraction. IET Computer Vision. 2024;18(1):124–140.
  63. 63. Van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of machine learning research. 2008;9(11).
  64. 64. Zou D, Hu Z, Wang Y, Jiang S, Sun Y, Gu Q. Layer-dependent importance sampling for training deep and large graph convolutional networks. Advances in neural information processing systems. 2019;32.