Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Detecting drug-drug interactions using artificial neural networks and classic graph similarity measures

  • Guy Shtar ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing – original draft

    Affiliation Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel

  • Lior Rokach,

    Roles Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing

    Affiliation Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel

  • Bracha Shapira

    Roles Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing

    Affiliation Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel


Drug-drug interactions are preventable causes of medical injuries and often result in doctor and emergency room visits. Computational techniques can be used to predict potential drug-drug interactions. We approach the drug-drug interaction prediction problem as a link prediction problem and present two novel methods for drug-drug interaction prediction based on artificial neural networks and factor propagation over graph nodes: adjacency matrix factorization (AMF) and adjacency matrix factorization with propagation (AMFP). We conduct a retrospective analysis by training our models on a previous release of the DrugBank database with 1,141 drugs and 45,296 drug-drug interactions and evaluate the results on a later version of DrugBank with 1,440 drugs and 248,146 drug-drug interactions. Additionally, we perform a holdout analysis using DrugBank. We report an area under the receiver operating characteristic curve score of 0.807 and 0.990 for the retrospective and holdout analyses respectively. Finally, we create an ensemble-based classifier using AMF, AMFP, and existing link prediction methods and obtain an area under the receiver operating characteristic curve of 0.814 and 0.991 for the retrospective and the holdout analyses. We demonstrate that AMF and AMFP provide state of the art results compared to existing methods and that the ensemble-based classifier improves the performance by combining various predictors. Additionally, we compare our methods with multi-source data-based predictors using cross-validation. In the multi-source data comparison, our methods outperform various ensembles created using 29 different predictors based on several data sources. These results suggest that AMF, AMFP, and the proposed ensemble-based classifier can provide important information during drug development and regarding drug prescription given only partial or noisy data. Additionally, the results indicate that the interaction network (known DDIs) is the most useful data source for identifying potential DDIs and that our methods take advantage of it better than the other methods investigated. The methods we present can also be used to solve other link prediction problems. Drug embeddings (compressed representations) created when training our models using the interaction network have been made public.


Adverse drug events are often preventable causes of medical injuries, and adverse drug reactions (ADRs) are estimated to be the fourth leading cause of death in the U.S., ahead of pulmonary disease, diabetes, AIDS, pneumonia, accidents, and automobile fatalities [1]. The cost attributed to ADRs is estimated to be over $1,000 per patient per year in the US [2]. Estimates of the number of patients harmed due to drug interactions range from 3-5% of all medication errors within hospitals. Additionally, drug interactions are the cause of many patient visits to physicians and emergency units [3, 4]. Thirty-six percent of older adults in the U.S. regularly use five or more medications or supplements, and 15% are potentially at risk for a major drug-drug interaction (DDI) [5]. The American Geriatrics Society has identified the consideration of drug-disease and drug-drug interactions as a key element of optimal care for older adults with multimorbidity [6]. DDI prediction during the clinical experiments conducted in order to approve a new drug is difficult [7]. Clinical trials for new drugs don’t address the issue of DDI directly, and potential DDIs are often not discovered until the third phase of a clinical trial or once the drug is already on the market. The most practical way to explore the large number of drug combinations for detecting interacting drugs is through in silico drug-drug interaction detection, and in this paper, we propose a computational method for DDI detection.

In recent years, the detection of potential DDIs using computational techniques has gained attention; previous research has used techniques based on drug-drug interaction similarities [8], side effect similarities [9], structural similarities [10], or a combination of various similarity measures [1114]. TMFUF is a method for predicting drug-drug interaction for new drugs based on triple matrix factorization, it uses side-effects for this task [15]. Other works use natural language processing (NLP) techniques to train word embedding using document collections such as PubMed, PMC, MEDLINE, and Wikipedia; the embeddings are later used to predict DDIs [16]. Computational methods often require a large amount of data for optimization. For example, when evaluating a new drug using structural-based similarity methods, the method will require data showing a strong, well established history for structurally similar drugs in order to accurately detect drug interactions. Side effect similarity-based methods require data for drugs with similar side effects, etc. We compare our proposed methods with other methods which were created using various data types, and the results indicate that the interaction network (known DDIs) is the most useful data source for identifying potential DDIs. Like other data sources, the interaction network has limitations. For example, when evaluating a new drug with no known interactions, the DDI network will not be helpful. Therefore, the drug-drug interaction prediction problem should be investigated using various data types.

DDI detection can be seen as a special case of link prediction in a graph. In a link prediction problem, we seek to accurately predict the edges (interactions) between nodes (drugs) that will be added to the network. We approach the DDI prediction problem as a link prediction problem. Perhaps the most basic approach is to rank edges based on the idea that two nodes x and y are more likely to form a link if their sets of neighbors have a large overlap; this follows the natural intuition that such nodes x and y represent drugs with many interacting drugs in common, and hence are more likely to interact. Matrix factorization is another approach for resolving link prediction problems. Matrix factorization (MF) is the factorization of a matrix into a product of matrices; this technique is widely used for dimensionality reduction, specifically in the field of recommender systems. In recent years, successful attempts have been made to factorize a matrix using deep neural networks [1719]. Figs 1 and 2 demonstrate how the DDI prediction problem can be solved by factorizing and reducing the dimensionality of the adjacency matrix representing the drug-drug interaction graph. For clarity, in Fig 2 the prediction matrix is made symmetric by averaging opposite cells. The figures also demonstrate the drawbacks of matrix factorization addressed by this research: first, the decomposition is not symmetric. Since the row vectors and column vectors of the adjacency matrix are identical, the transpose of the columns matrix should be equal to the rows matrix. The second drawback is that the score is not bound, it can be limited to the range [0, 1].

Fig 1. Tackling the DDI prediction problem as a link prediction problem.

A) A DDI graph is created: nodes represent drugs, and edges represent interactions. B) The DDI graph is represented by an adjacency matrix, rows and columns represent drugs, and a value of one in the matrix indicates an existing interaction; for example, the cell in the first row and the last column represents the interaction between D1 and D5. In a link prediction problem, a score is calculated to every non-existing interaction.

Fig 2. Link prediction using matrix factorization for DDI prediction.

The dimension of the adjacency matrix is reduced by factorizing it into two lower ranked matrices. By multiplying the matrices a score is calculated for every existing and non-existing interaction. In this case, an interaction between D5 and D3 is very likely to exist. A score is also given to existing links: the interaction between D1 and D4 is stronger than the interaction between D1 and D5.

In this paper, we introduce AMF and AMFP, two novel methods for predicting DDIs based on artificial neural networks and the implementation of factor propagation over the interaction network. Unlike some of the methods presented in previous research, AMF and AMFP use only known drug interactions as input and predict currently unknown drug interactions. Additionally, most of the previous studies based their work on similarity measures, but AMF and AMFP are based on machine learning techniques, specifically on neural networks. We compare AMF and AMFP to existing methods which are based on multiple data sources and create an ensemble-based classifier using AMF, AMFP, and other well-known link prediction methods. Compared with existing methods, our methods produce better performances using only known DDIs as the data source, and the statistical analysis demonstrates that the performance improvements achieved by our method are statistically significant. In this paper, we make three key contributions: (1) we formulate a new artificial neural network-based method for link prediction, (2) we demonstrate its effectiveness for the drug-drug interaction prediction challenge, conducting extensive evaluations with real data to show the superiority of the interaction network (known DDIs) as a data source; to show the superiority of our method, we create drug embeddings for all available drugs, and (3) we create an ensemble-based classifier to demonstrate the benefit of combining existing high-performing classifiers. The preprocessing, methods, and drug embeddings developed, calculated, and used in this research were implemented and have been made public:

Materials and methods

Problem formulation

We approach the drug-drug interaction prediction problem as a link prediction problem. Suppose we have an undirected drug interaction network G = (V, E) in which each edge e = (u, i) ∈ E represents an interaction between drugs u and i. Note that throughout the paper we use the terms graph node and drug interchangeably. We use two versions of the drug interactions graph: G and G′. For two time snapshots t < t′ let G denote the graph constructed using the known interactions at time t, and G′ denote the constructed graph using the known interactions at time t′. This is a concrete formulation of the drug-drug interaction prediction problem: we give an algorithm access to network G, and it must then output a list of edges not present in G, which are predicted to appear in network G′. We refer to t as the training release date and t′ as the test release date. Of course, drug interaction networks grow through the addition of nodes (drugs) as well as edges. The training process uses only existing interactions to predict unknown ones. It is not sensible to seek predictions for edges whose endpoints are not present in the training interval, as such a prediction will be based on partial information in terms of link prediction. We use the adjacency matrix to represent G and G′, and the matrices are used to train AMF and AMFP: Let M denote the number of drugs. We define the drug-drug interaction matrix as follows: (1) Here, a value of one for yi,j indicates an existing interaction between drugs i and j, however a value of zero does not mean that an interaction does not exist—it could be that the interaction has not yet been discovered.

Adjacency matrix factorization

Traditionally, matrix factorization associates each row element i and column element j with a corresponding latent vector pi and qj. The estimate of the corresponding cell yi,j of the matrix is given by the inner product of the vectors: (2) where the vectors pi and qj are the latent factors, sometimes also referred to as representations, because they can be used as an alternative representation of the original row and column objects. The space size k is a parameter, usually set to a much lower value than the original space size. Using an extremely low k value might lead to underfitting; on the other hand, extremely high values might lead to overfitting. MF is sometimes improved by using a bias value corresponding to the row and column elements: (3) where μ is the average value over the whole matrix, and bi and bj are the bias values for the row and column elements, correspondingly. The parameters are typically learned using optimization techniques such as stochastic gradient decent. Regularization techniques are often used during the optimization process. AMF (adjacency matrix factorization) performs matrix factorization on the adjacency matrix of G. Because the graph G is undirected, the adjacency matrix M is symmetric. Therefore, it is sufficient to use a single vector and bias value shared between the rows and columns to estimate M’s cells. To allow precise DDI prediction, we use an artificial neural network-based model that encompasses the linear structure of the interaction network. The method is based on optimizing the latent factors of each drug in the network; the latent factor is an k-dimensional vector. Fig 3 provides an overview of AMF’s architecture. AMF takes a one-hot encoding representation of the two nodes under consideration as input. The output of AMF is binary; one indicates an existing interaction, and zero indicates no interaction between the two input nodes. Note that only an existing drug interaction network is required to train the proposed neural network. No other domain-specific information is required.

Fig 3. Overview of AMF’s architecture.

Drugs are represented as nodes; embedding layers (which act as latent factors) and biases are shared between input nodes. Dropout is used as a regularization mechanism for preventing overfitting.

The use of a simple inner product to estimate complex drug-drug interactions in the low-dimensional latent space might be oversimplistic. The complexity of this model can be increased by increasing the embedding layer size, however that might cause overfitting. The space size k should be carefully tuned during the training phase. Matrix factorization and AMF are closely related to singular value decomposition (SVD). AMF’s main improvements in the optimization stage compared to matrix factorization and SVD are: (1) sharing the latent vectors of the rows and columns, and (2) optimizing the weights of the element-wise multiplication and the bias, rather than just using a dot product and adding the biases. AMF is optimized using the adaptive moment estimation (Adam) optimization algorithm [20] which adjusts the learning rate for each parameter using estimates of the first and second moments of the gradients. MF and SVD are aimed at minimizing the mean square error; for a classification task such as a link prediction problem, binary cross-entropy is more appropriate. Hence, the loss function used in AMF is binary cross-entropy, defined as follows: (4) where Y is the set of instances (drug pairs), yi,j is the true label which represents the existence or absence of an interaction, and is the predicted value. Y is created using negative sampling, where all positive samples are used (existing drug interactions), and a relative number of negative samples are drawn randomly in each epoch; the number of negative samples to be sampled is a hyperparameter of the model that should be tuned. In this research we sample one negative sample for each given positive sample.

Adjacency matrix factorization with propagation

AMF excels in the holdout analysis where data is randomly sampled for the testing set (see Results section for more details). Nevertheless, despite all of the regularization techniques applied to it, AMF’s generalization ability and performance are poor in the retrospective analysis where a new unseen version of the dataset is used. The underlying mechanism that causes an interaction to be discovered and added to the database is not random—it depends on mediators such as new interactions discovered between substances contained in the drugs, the drug’s prevalence, and more. To perform well in the retrospective analysis, higher generalization ability is required from the model. Adjacency matrix factorization with propagation (AMFP) is an extension of AMF. In AMFP, the same model used in AMF is used, but an additional step is performed: propagating the latent factors of each drug to its interacting drugs; latent factor propagation is controlled by a propagation factor, which controls the weight of the original latent factor (which was optimized in the previous step) compared to the weight of the latent factor of the interacting drugs. Algorithm 1 describes the propagation procedure. Each node’s latent factor is shared with the node’s neighborhood. The parameter α is the propagation factor, which controls how much information will be passed from the neighboring nodes. The value of α should be optimized during the training process.

Algorithm 1 Latent factor propagation

1: procedure propagate_factors Graph G = (V, E), Latent factors P, Propagation factor α)

2:  P′ ← empty list // holds the new latent factors

3:  for vertex v1 in G do

4:   Qempty latent factor

5:   for vertex v2 in Γ(v1) do


7:   end for


9:  end for

10:  return P

11: end procedure

Given a node vV; the neighborhood of v is defined by Γ(v) and represented by the set of v’s interacting drugs. The lists P and P′ contain the original latent factors (embeddings) and the latent factors resulting from the propagation process correspondingly. Each list contains vectors (each vector has k elements) representing the latent factor of the nodes. α is the propagation factor; when α reaches a value of one, the original latent factor of each node is discarded, and a new latent factor is created based on the neighborhood of the node. On the other extreme, when α reaches a value of zero, the latent factors created in the previous step are used, and the propagation step does not change the model. When α’s value is equal to zero, the results of AMPF and AMF are equivalent. Propagating the factors is expected to improve the generalization ability of the model by combining the factors of interacting drugs. This logic follows the assumption that interacting drugs share some common characteristics.

Link prediction similarity measures

Link prediction similarity measures can be viewed as pre-engineered features which leverage domain knowledge for link prediction in graphs. This subsection is devoted to formulating and explaining the motivation behind the similarity measures used in this research for creating an ensemble-based classifier and evaluation.

The common neighbors between two given nodes u, vV refers to the size of the set of common neighbors that both u and v possess. The formal common neighbors definition is: (5) The relevance of the common neighbors feature is very intuitive. It is expected that the larger the size of the common neighborhood, the higher the chances are that both vertices will be connected. The common neighbors feature has been widely used in past work on link prediction on several datasets and was found to be very helpful [21]. Using the common neighbors measure, we formulate the average common neighbors for two nodes: (6) The average common neighbors measure provided above is not symmetric, due to the normalizing factor |Γ(v)|; we formulate it as a symmetric measure by averaging its two possible values for a pair of nodes: (7) The Jaccard coefficient is a well-known similarity measure, widely used for link prediction [21]. For two nodes u, vV the Jaccard coefficient is defined as follows: (8) As with common neighbors, we formulate the average Jaccard coefficient for two vertices as follows: (9) and its symmetric version is given as follows: (10) The Adamic/Adar index [22] is a similarity measure used to predict links in social networks: (11) Lastly, we present Katzb which exponentially sums the number of shortest paths of different lengths between two nodes: (12) where is the number of paths between u and v of length , and β is a parameter controlling the weight given to shorter paths compared to the weight given to longer ones. In practice, a truncated Katz measure is usually used: (13) In this research, we use b = 3 due to resource limitations. Katzb was found to be very helpful for link prediction in previous works [23]. The final list of link prediction similarity measures used in this research consists of: average common neighbors, average Jaccard coefficient, Adamic/Adar, and Katzb. The similarity measures between two nodes used by Fire et al. [24], such as the shortest path length between nodes, cosine distance between nodes, and dividing the graph into communities and then comparing two nodes’ communities were tested and discarded due to poor performance.

Creating an ensemble-based classifier

Ensembles are meta-algorithms used to combine various classifiers. They can reduce the variance and bias of the base models and improve the predictions in general. One such ensemble method is XGBoost [25] which achieved state of the art results in multiple tasks and competitions. XGBoost employs gradient boosting where models are created stage wise using weak predictors, usually using prediction trees. In each stage, the model seeks to improve the performance of the model created in the previous stage. We train an ensemble classifier using XGBoost, based on the link prediction similarity measures presented above: average common neighbors, average Jaccard coefficient, Adamic/Adar, and Katzb. Additionally AMF or AMFP (the method that performs better) and the method proposed by Vilar et al. [8] are fed to the ensemble classifier. This meta-algorithm can be easily extended to include additional features.


In this section, we present experiments with the aim of evaluating AMF, AMFP, and the ensemble-based classifier. Our evaluation is based on two evaluation schemes: a retrospective analysis using approved drugs from three versions of the DrugBank database [26] and a holdout analysis using a current version of the database. We use state of the art benchmarks.

Fig 4 illustrates the validation and testing scheme for the retrospective analysis using three versions of the DrugBank database. Major changes were made between the versions—specifically, a large number of interactions were added to the more recent version. For the validation process, we aligned versions 4.1.0 and 5.0.0 by only using drugs which appear in both versions. The same was done for versions 5.0.0 and 5.1.1 when training and testing the final model. Version 4.1.0 from December 2014 contains 11,284 interactions, and versions 5.0.0 from June 2016 and 5.1.1 from July 2018 contain 45,296 and 248,146 interactions respectively. Versions 4.1.0, 5.0.0, and 5.1.1 respectively contain 1,141, 1,440, and 2,149 drugs. To test whether our model could predict pharmacodynamic as well as pharmacokinetic interactions, we adopt a similar evaluation scheme to the one used by Vilar et al. [8]. We use DrugBank annotations to identify any interactions between drugs with shared metabolism by a cytochrome p450 (CYP) metabolizing enzyme (1A2, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1, 3A4, 3A5 and 3A7). Such interactions are removed from the test set (release 5.1.1), and the rest of the retrospective analysis is executed normally.

Fig 4. Retrospective evaluation scheme.

Parameter tuning is performed using DrugBank release 4.1.0 and 5.0.0. The previous release is used to train the model, and the latter is used to validate the results. The final model is trained using the parameters obtained in the validation stage with the data from release 5.0.0 (which contains the data from release 4.1.0 with some additions and changes) and tested using release 5.1.1.

In addition to the retrospective analysis, we perform a holdout evaluation using release 5.1.1, the latest release available during this research. The setup of the holdout evaluation is as follows: 30% of randomly selected existing and non-existing interactions are used as a test set, and the rest of the data is used as a training set, 10% of the data is used for validation (parameter tuning) during training. For both evaluation techniques, the model is retrained after the validation process, with the combined training and validation data, using the tuned parameters. In a holdout evaluation, the interactions are randomly selected, while in reality some interactions are more likely to be found earlier depending on the popularity of the drug, the prevalence of the interaction, etc. For these reasons, the retrospective analysis is a stronger evaluation scheme, however we perform a holdout evaluation for comprehensiveness and to comply with previous research.


The primary evaluation metric we use is the area under the ROC curve (AUROC). We also assess the area under the precision-recall curve (AUPR), because it was argued to be relevant for link prediction problems [27]. We plot the ROC curve and the average precision @ n, where the precision of each drug’s prediction is averaged at different values of n. Lastly, we plot precision @ n which evaluates the top n most confident predictions of the model. We acknowledge the importance of precision over recall in the DDI problem, therefore we plot the two precision graphs in addition to the other metrics.


We compare our proposed method with the following methods:

  • The method suggested by Vilar et al. [8]. This method is based on drug interaction profile fingerprints (IPFs). The model uses IPFs to measure the similarity of pairs of drugs and generates new putative drug-drug interactions from the non-intersecting interactions of a pair. Their method uses the same input data used by the method proposed in this paper.
  • The link prediction similarity measures presented earlier: average common neighbors, average Jaccard coefficient, Adamic/Adar, and Katzb.
  • An XGBoost model trained using all of the models described in the previous bullets and AMF or AMFP (we use the model which performs better). This model is used to demonstrate the power of combining several strong methods, rather than being used in comparison with the other methods mentioned above, as a comparison between a regular model and an ensemble is inappropriate.
  • The methods used by Zhang et al. [12]. The following multi-source data is used: substructure data, drug target data, drug enzyme data, drug transporter data, drug pathway data, drug indication data, drug side effect data, and known drug-drug interactions. The neighbor recommender method and random walk method are used to create DDI prediction models. Using 29 prediction models, including 28 similarity-based models and one perturbation matrix model, three ensembles are created based on logistic regression with L1 and L2 regularization and a genetic algorithm. The ensembles are compared to two existing methods presented by Vilar et al. [8, 28] and three methods presented by Zhang et al. [9], our methods are compared indirectly with these methods by using this baseline and the same dataset. The authors include data from different sources, creating a diverse dataset. Unfortunately, the dataset is available for just a single point in time, which does not allow a retrospective analysis. To compare the methods presented by Zhang et al. [12] to ours, we adopt the cross-validation scheme used in the original research. We use three and five-fold cross-validation, repeat each experiment five times and use pairwise t-test on the results.

We implemented AMF and AMFP using Keras [29]. The method suggested by Vilar et al. was implemented in Python; we used the original implementation and data for the methods suggested by Zhang et al. For Adamic/Adar and several other methods we used the NetworkX implementation [30]. Other methods, such as Katzb, were implemented in Python. For XGBoost, the implementation proposed by the authors was used. We assess the AUROC score of AMF, AMFP, and each of the baselines for significance with a paired test using the algorithm described by Sun and Xu [31].

Parameter tuning

To determine the hyperparameters of AMF and AMFP, we used the procedure described above. All weights are randomly initialized using the Glorot normal initializer [32]. The following batch sizes were used: 128, 256, 512, and 1024, and learning rates in the range of 0.1—0.0001 were tested. We evaluate the following number of factors (embedding sizes): 32, 64, 128, 256, 512, and 1024, dropout levels in the range of 0-0.9, the number of epochs in the range of 1-50, and propagation factors in the range of 0.0-1.0. For XGBoost, the parameters were optimized using randomized grid search, where combinations of parameters were drawn randomly from a given list and evaluated.


In this section, we report the results of AMF, AMFP, and the baselines using the evaluation techniques described in the previous section.

Holdout analysis

Holdout analysis is performed by using 70% of the data in DrugBank release 5.1.1 to train the models; the rest of the existing and non-existing interactions are used for evaluation. Fig 5A shows the ROC curve for AMF and each of the baselines. Table 1 presents the AUROC and AUPR values for each model. The AUROC of each pair of models was tested for significance; we report a pvalue < 10−4 for all tests. Fig 5B shows the average precision @ n (per drug), where n ranges from one to five, and Fig 5C shows the precision @ n, where n ranges from one to 100. The optimal value for AMFP’s α is zero, hence its performance is equivalent to that of AMF. Therefore, we do not present its results in the holdout analysis, and it is not used in the XGBoost model trained using the holdout data.

Fig 5. Holdout analysis results.

A) Receiver operating characteristic curves; B) Per drug average precision @ n; C) Precision @ n.

Table 1. Area under the ROC and precision-recall curves for the holdout analysis.

Retrospective analysis

Retrospective analysis is performed by training the models on an older version of DrugBank and evaluating the models using a more recent version of DrugBank. Fig 6A shows the ROC curve for AMF, AMFP, and each of the baselines. Table 2 presents the AUROC and AUPR values for each model. The AUROC of each pair of models was tested for significance; we report a pvalue < 10−4 for all tests. Fig 6B shows the average precision @ n (per drug), where n ranges from one to five, for the first interaction (n = 1), the accuracy is about 56% for both AMFP and the XGBoost ensemble. Fig 6C shows the precision @ n, where n ranges from one to 100. XGBoost was trained using AMFP’s predictions and without AMF’s predictions because of their superiority. XGBoost has the best performance in terms of the AUROC and AUPR curves, followed by AMFP. The average precision @ n and precision @ n graphs demonstrate the XGBoost model superiority, it performs best for almost all values of n. To test whether our model could predict pharmacodynamic as well as pharmacokinetic interactions, we removed any interactions between drugs with shared metabolism by a cytochrome p450 (see Evaluation section). A total of 56,874 interactions were removed (37.7% of the interactions). We report an AUROC of 0.775 for AMFP and 0.705 for AMP, a performance reduction of 0.032 and 0.043 respectively. For reference, the performance reduction is 0.044 for the method developed by Vilar et al. [8]. These results suggests that AMF and AMFP take different pharmacological effects caused by pharmacokinetic and pharmacodynamic characteristics of the drugs into account. While the relative performance between the retrospective analysis and the holdout analysis is somewhat similar for all models, the absolute differences are obvious. This phenomena can be explained by the fact that each DrugBank release is a closed system of interactions known at a given time, sometimes derived from interactions between substances contained in different drugs. The absolute differences between results demonstrate the weakness of holdout evaluation and its difficulty in simulating real-world scenarios compared to retrospective evaluation.

Fig 6. Retrospective analysis results.

A) Receiver operating characteristic curves; B) Per-drug average precision @ n; C) Precision @ n.

Table 2. Area under the ROC and precision-recall curves for retrospective analysis.

Comparison to multi-source data-based predictors

In this subsection, we report the results of AMF, AMFP, the XGBoost classifier, and the methods presented by Zhang et al. [12], adopting the cross-validation evaluation used by Zhang et al. Tables 3 and 4 present the results for three and five-fold cross-validation. The optimal value for AMFP’s α is zero, hence its performance is equivalent to that of AMF. Therefore, we do not present its results in the tables. As can be seen, AMF outperforms all of the methods, including the ensembles proposed by Zhang et al. and the XGBoost ensemble. Furthermore, when adding AMF to the ensembles proposed by Zhang et al. the performance of the ensembles is still lower than that of AMF on its own. The difference between AMF and the other methods presented in Tables 3 and 4 are statistically significant. The differences are also statistically significant when comparing the XGBoost ensemble to the other methods presented in the tables. We report a pvalue < 10−4 for all tests. These results which indicate that methods based on interaction networks (known DDIs) perform better than methods based on other data types align with the results presented by Zhang et al. Unfortunately, we are unable to compare the methods using retrospective analysis due to data unavailability. Cross-validation is very similar to hold-out analysis; in both cases, interactions are selected randomly and used as a test set. The differences between our hold-out analysis and our retrospective analysis indicate that if the multi-source data-based predictors and the methods we propose were compared using retrospective analysis, the differences would be even greater.

Table 3. Area under the ROC and precision-recall curves for multi-source data comparison, three-fold cross-validation.

Table 4. Area under the ROC and precision-recall curves for multi-source data comparison, five-fold cross-validation.


Drug interactions are the cause of many patient visits to physicians and emergency units. Estimates of the number of patients harmed due to drug interactions range from 3-5% of all medication errors within hospitals [3, 4]. Potential DDIs are often not discovered until the third phase of a clinical trial or in many cases, only after the drug has already been on the market for some time. In silico drug-drug interaction prediction methods, such as the methods proposed in the current research, are the most practical way of detecting DDIs. We introduced AMF and AMFP, two new methods for in silico drug-drug interaction prediction and used DrugBank to demonstrate the superiority of the proposed methods compared to existing methods for the following metrics: AUROC and AUPR curves, precision @ n, and average precision @ n per drug. The improvement was demonstrated by predicting the interactions for a new version of DrugBank and when using a holdout evaluation scheme. We demonstrate that our methods are capable of handling both pharmacokinetic and pharmacodynamic DDIs. In addition, our results indicate that the interaction network (known DDIs) is the most useful data source for identifying potential DDIs. An ensemble method trained using XGBoost obtained better results than AMF and AMFP in most metrics and evaluation schemes. Potentially, the XGBoost ensemble can be further improved by adding more models or including domain specific information (e.g., structural data), however this may come at the cost of much longer training time. Additionally, as the multi-source data-based predictors comparison demonstrates, more data sources do not necessarily improve the performance.

In this section, we present a more in depth analysis on the propagation factor of AMFP, testing different values for the validation and test sets to investigate whether there is a strong correlation between the two. Fig 7 presents AMFP’s propagation factor analysis for the hold-out and retrospective analysis. A big difference can be seen between the two evaluation schemes. For the retrospective scheme the optimal values are 0.5 and 0.8 respectively for the validation and test sets. The effect of different propagation factor values on the AUROC of the validation and test sets is similar. For the holdout evaluation scheme the effect of different propagation factor values on the AUROC of the validation and test sets is very similar; smaller values are preferred, and zero is the optimal value. This means that no propagation is required for the validation or test sets. This difference could be the result of the difference in the test set distributions. As stated in the Evaluation section, retrospective analysis is preferred as it is more true to life and necessitates that the model generalizes better than the holdout analysis. Both evaluation techniques show better results on the test set, which is relatively unusual. For the holdout analysis the difference is very small (about 0.02); this difference might simply be explained by the amount of data, in that the model evaluated on the validation set is trained using less data than the model evaluated on the test set. For the retrospective analysis where the difference is larger, the reason is probably similar: the previous version of DrugBank used for training during validation contains fewer interactions (11,284 interactions) than the version used to train the final model which was used for testing (with 45,296 interactions).

Fig 7. AMFP’s propagation factor analysis.

A) Retrospective propagation factor analysis. The optimal value selected during validation and used for model training is 0.5. The optimal value for the test set is 0.8. B) Holdout propagation factor analysis. For both validation and training the optimal value is zero—weights are not propagated at all.

We report the optimal values for the parameters used: the embedding size used are 256 and 512 for the retrospective and holdout analysis respectively. For both evaluation schemes, the dropout is 0.3, and the learning rate is 0.01. It’s important to note that the dropout is applied separately on the embedding layer of each of the drugs. On average the number of entries in the embedding vector which are not affected at all by dropout during training is given by: (14) where k is the embedding size, and p is the dropout ratio. Hence, for the retrospective analysis where the embedding size was 512 on average only 184.32 embedding entries are unaffected by the dropout during training. The optimal number of epochs used is five and six for the retrospective and holdout analysis respectively, and the batch sizes are 1024 and 256 respectively. For the comparison to multi-source data-based predictors the optimal values for the parameters used in AMF were: embedding size of 64, dropout of 0.5, forty epochs, batch size of 256 and a learning rate of 0.01. Some interesting observations can be made by comparing AMFP’s predictions regarding each drug pair with the pair’s structural similarity. One can expect that if two drugs share similar interactions it is likely that they have some structural similarity, however the predictions and structural similarity might be different or complementary. We computed the correlation coefficient between AMFP’s predictions and the structural similarity. For the structural similarity we used the method which was used by Ryu et al. [10]. This comparison showed a low correlation coefficient of 0.151 with Pvalue < 10−15; for comparison, Vilar et al. [8] report a correlation coefficient of 0.167.

Practical contribution

The proposed models can be utilized to improve drug-drug interaction discovery and can be combined with additional structural information to improve drug-drug interaction detection performance.

In this paper, we present the top 100 predictions made by AMFP (see Supporting information section) after training it on release 5.1.1 of DrugBank and using the parameters optimized in the retrospective analysis; note that version 5.1.1 was the latest release available when our research was conducted. We manually validated the first 10 predictions made by AMFP; as of now, eight of them have been added to DrugBank. For the following five drug-drug interactions the metabolism of one drug can change due to the drug interaction: Curcumin and Primidone, Rifapentine and Fluvoxamine, Curcumin and Rifapentine, Lumacaftor and Fluvoxamine, and Curcumin and Lovastatin. For the following three drug-drug interactions the serum concentration of one drug can change due to the drug interaction: Ceritinib and Fluvoxamine, Curcumin and Clotrimazole, and Curcumin and Lumacaftor. No evidence was found for the existence of the following two drug-drug interactions: Pentobarbital and Sulfisoxazole, and Curcumin and Pentobarbital which might indicate two new unknown interactions predicted by our methods.

In addition, we created a list of the latent factors (embeddings) created for each drug and made this publicly available. These factors can be used as compressed representations of the drugs. The factors contain the structure of the interaction network and can provide a head start on downstream tasks in the form of transfer learning. For example, the drug embeddings created using the interaction network can be used to detect side effects.

AMF and AMFP can be scaled to support a large number of drugs and interactions; the models do not require training using all of the positive examples (existing interactions), and positive sampling can also be used, allowing the method to operate on very large datasets. Each of the proposed methods presented here required no more than a few minutes to train using a standard laptop. Hyperparameter optimization required more time, and this process can usually be executed efficiently by an expert; AutoML methods for parameter optimization are currently gaining interest, and such methods can dramatically reduce the time required for optimizing hyperparameters [33].


In this paper, we designed two methods for drug-drug interaction prediction based on a novel matrix factorization technique designed for adjacency matrices and developed useful in silico models to predict new drug interactions. Additionally, we train an XGBoost ensemble using various predictors. The methods were implemented and made public, along with additional resources used in this research. Our methods were systematically validated through a retrospective and holdout evaluation using DrugBank (release 5.1.1 which contains 1,440 drugs and 248,146 drug-drug interactions), showing state of the art results with an area under receiver operating characteristic curve of 0.814 overall and accuracy of 56% when predicting the first interaction for each drug. Additionally, we compare and demonstrate the superiority of our methods over existing state-of-the-art methods, which were trained using various data sources, using cross-validation. Our methods can be used on a large-scale and applied for link prediction problems in domains other than drug-drug interaction prediction. Using the proposed DDI predictor, a database containing the most promising drug-drug interaction candidates is provided in the Supporting information section.


  1. 1. Preventable Adverse Drug Reactions: A Focus on Drug Interactions;. Available from:
  2. 2. DW B, N S, DJ C, et al. The costs of adverse drug events in hospitalized patients. JAMA. 1997;277(4):307–311.
  3. 3. Raschetti R, Morgutti M, Menniti-Ippolito F, Belisari A, Rossignoli A, Longhini P, et al. Suspected adverse drug events requiring emergency department visits or hospital admissions. Eur J Clin Pharmacol. 1999;54(12):959–963. pmid:10192758
  4. 4. Budnitz DS, Pollock DA, Weidenbach KN, Mendelsohn AB, Schroeder TJ, Annest JL. National surveillance of emergency department visits for outpatient adverse drug events. JAMA. 2006;296(15):1858–1866. pmid:17047216
  5. 5. Qato DM, Wilder J, Schumm LP, Gillet V, Alexander GC. Changes in Prescription and Over-the-Counter Medication and Dietary Supplement Use Among Older Adults in the United States, 2005 vs 2011. JAMA Intern Med. 2016;176(4):473–482. pmid:26998708
  6. 6. Guiding Principles for the Care of Older Adults with Multimorbidity: An Approach for Clinicians. Journal of the American Geriatrics Society;60(10):E1–E25. pmid:22994865
  7. 7. Corrigan OP. A risky business: the detection of adverse drug reactions in clinical trials and post-marketing exercises. Social Science & Medicine. 2002;55(3):497–507.
  8. 8. Vilar S, Uriarte E, Santana L, Tatonetti NP, Friedman C. Detection of Drug-Drug Interactions by Modeling Interaction Profile Fingerprints. PLOS ONE. 2013;8(3):1–11.
  9. 9. Zhang P, Wang F, Hu J, Sorrentino R. Label Propagation Prediction of Drug-Drug Interactions Based on Clinical Side Effects. Scientific Reports. 2015;5(1).
  10. 10. Ryu JY, Kim HU, Lee SY. Deep learning improves prediction of drug–drug and drug–food interactions. Proceedings of the National Academy of Sciences. 2018;115(18):E4304–E4311.
  11. 11. Gottlieb A, Stein GY, Oron Y, Ruppin E, Sharan R. INDI: a computational framework for inferring drug interactions and their associated recommendations. Molecular Systems Biology. 2012;8(1). pmid:22806140
  12. 12. Zhang W, Chen Y, Liu F, Luo F, Tian G, Li X. Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data. BMC Bioinformatics. 2017;18(1):18. pmid:28056782
  13. 13. Park K, Kim D, Ha S, Lee D. Predicting Pharmacodynamic Drug-Drug Interactions through Signaling Propagation Interference on Protein-Protein Interaction Networks. PLOS ONE. 2015;10(10):1–13.
  14. 14. Zhang W, Jing K, Huang F, Chen Y, Li B, Li J, et al. SFLLN: A sparse feature learning ensemble method with linear neighborhood regularization for predicting drug–drug interactions. Information Sciences. 2019;497:189–201.
  15. 15. Shi JY, Huang H, Li JX, Lei P, Zhang YN, Dong K, et al. TMFUF: a triple matrix factorization-based unified framework for predicting comprehensive drug-drug interactions of new drugs. BMC Bioinformatics. 2018;19(14):411. pmid:30453924
  16. 16. Sangrak Lim KL, Kang J. Drug drug interaction extraction from the literature using a recursive neural network. PLOS One. 2018;13(1):e0190926. pmid:29373599
  17. 17. He X, Liao L, Zhang H, Nie L, Hu X, Chua TS. Neural Collaborative Filtering. In: Proceedings of the 26th International Conference on World Wide Web. WWW’17. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee; 2017. p. 173–182. Available from:
  18. 18. Wu H, Zhang Z, Yue K, Zhang B, He J, Sun L. Dual-regularized matrix factorization with deep neural networks for recommender systems. Knowledge-Based Systems. 2018;145:46–58.
  19. 19. Fan J, Cheng J. Matrix completion by deep matrix factorization. Neural Networks. 2018;98:34–41. pmid:29154225
  20. 20. Diederik P Kingma JB. Adam: A Method for Stochastic Optimization. In: International Conference on Learning Representations (ICLR); 2015.
  21. 21. Cukierski W, Hamner B, Yang B. Graph-based features for supervised link prediction. In: The 2011 International Joint Conference on Neural Networks; 2011. p. 1237–1244.
  22. 22. Adamic LA, Adar E. Friends and neighbors on the Web. Social Networks. 2003;25(3):211–230.
  23. 23. Chen H, Li X, Huang Z. Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’05); 2005. p. 141–142.
  24. 24. Fire M, Tenenboim-Chekina L, Puzis R, Lesser O, Rokach L, Elovici Y. Computationally Efficient Link Prediction in a Variety of Social Networks. ACM Trans Intell Syst Technol. 2014;5(1):10:1–10:25.
  25. 25. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD’16. New York, NY, USA: ACM; 2016. p. 785–794. Available from:
  26. 26. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46(D1):D1074–D1082. pmid:29126136
  27. 27. Yang Y, Lichtenwalter RN, Chawla NV. Evaluating link prediction methods. Knowledge and Information Systems. 2015;45(3):751–782.
  28. 28. Vilar S, Harpaz R, Uriarte E, Santana L, Rabadan R, Friedman C. Drug—drug interaction through molecular structure similarity analysis. Journal of the American Medical Informatics Association. 2012;19(6):1066–1074. pmid:22647690
  29. 29. Chollet F, et al. Keras; 2015.
  30. 30. Hagberg A, Swart P, S Chult D. Exploring network structure, dynamics, and function using networkx. Proceedings of the 7th Python in Science Conference (SciPy 2008). 2008.
  31. 31. Sun X, Xu W. Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves. IEEE Signal Processing Letters. 2014;21(11):1389–1393.
  32. 32. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Teh YW, Titterington M, editors. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. vol. 9 of Proceedings of Machine Learning Research. Chia Laguna Resort, Sardinia, Italy: PMLR; 2010. p. 249–256. Available from:
  33. 33. Sun Y, Xue B, Zhang M, Yen GG. An Experimental Study on Hyper-parameter Optimization for Stacked Auto-Encoders. In: 2018 IEEE Congress on Evolutionary Computation (CEC); 2018. p. 1–8.