Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Improving Detection of Arrhythmia Drug-Drug Interactions in Pharmacovigilance Data through the Implementation of Similarity-Based Modeling

  • Santiago Vilar ,

    Affiliations Department of Biomedical Informatics, Columbia University, New York, NY, United States of America, Department of Systems Biology, Columbia University, New York, NY, United States of America, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States of America

  • Tal Lorberbaum,

    Affiliations Department of Biomedical Informatics, Columbia University, New York, NY, United States of America, Department of Systems Biology, Columbia University, New York, NY, United States of America

  • George Hripcsak,

    Affiliations Department of Biomedical Informatics, Columbia University, New York, NY, United States of America, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States of America

  • Nicholas P. Tatonetti

    Affiliations Department of Biomedical Informatics, Columbia University, New York, NY, United States of America, Department of Systems Biology, Columbia University, New York, NY, United States of America, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States of America, Department of Medicine, Columbia University, New York, NY, United States of America


Identification of Drug-Drug Interactions (DDIs) is a significant challenge during drug development and clinical practice. DDIs are responsible for many adverse drug effects (ADEs), decreasing patient quality of life and causing higher care expenses. DDIs are not systematically evaluated in pre-clinical or clinical trials and so the FDA U. S. Food and Drug Administration relies on post-marketing surveillance to monitor patient safety. However, existing pharmacovigilance algorithms show poor performance for detecting DDIs exhibiting prohibitively high false positive rates. Alternatively, methods based on chemical structure and pharmacological similarity have shown promise in adverse drug event detection. We hypothesize that the use of chemical biology data in a post hoc analysis of pharmacovigilance results will significantly improve the detection of dangerous interactions. Our model integrates a reference standard of DDIs known to cause arrhythmias with drug similarity data. To compare similarity between drugs we used chemical structure (both 2D and 3D molecular structure), adverse drug side effects, chemogenomic targets, drug indication classes, and known drug-drug interactions. We evaluated the method on external reference standards. Our results showed an enhancement of sensitivity, specificity and precision in different top positions with the use of similarity measures to rank the candidates extracted from pharmacovigilance data. For the top 100 DDI candidates, similarity-based modeling yielded close to twofold precision enhancement compared to the proportional reporting ratio (PRR). Moreover, the method helps in the DDI decision making through the identification of the DDI in the reference standard that generated the candidate.


Medication co-administration can alter the pharmacokinetic or pharmacodynamic profiles of the drugs being prescribed. Drug-Drug Interactions (DDIs) occur when the effect of one drug is altered by the co-administration of another drug. This change in the effect can lead to the development of clinically important adverse events. In fact, a significant amount of the adverse effects caused by drugs in the patients are due to the administration of multiple medications [13]. As an example of DDIs, some macrolides, such as erythromycin, inhibit the metabolism and the elimination of warfarin [4]. This fact could cause an increased effect of warfarin with the consequent risk due to its anticoagulant properties. Another example is the combination of simvastatin and posaconazole, associated with a risk of myopathy and rhabdomyolysis due to increased statin plasma concentrations [5].

Pharmacovigilance focuses on the collection, monitoring and evaluation of adverse events caused by drugs and other biological products in the pharmaceutical market. Pharmacovigilance agencies, such as the FDA U. S. Food and Drug Administration, are interested in the use of post-marketing data to analyze possible adverse drug effects (ADEs) and possible DDIs that cause higher impact in ADE development. However, improvements in the current approaches are still needed to help in the early detection of DDIs.

Recently, a number of computational methods have been successfully applied to predict DDIs. Among them, cheminformatic methodologies, such as protein-structure-based and ligand-based methods, have been used in the detection of DDIs. Cheminformatics provides a useful approach through the use of 2D/3D QSAR (quantitative structure-activity relationships) [68], homology modeling [9] and molecular docking [10]. These methods can infer similarity between sets of drugs [1113] and study possible interactions with pharmacodynamics or pharmacokinetic targets. In previous work, we have leveraged cheminformatics to construct general models of DDIs [11, 12].

On the other hand, scientific literature and pharmacovigilance databases are additional sources with important implications in DDI discovery [3, 14]. Percha et al. [15] mined the scientific literature to detect DDIs through the extraction of gene-drug relationships. Mining electronic health records (EHRs) or the FDA’s Adverse Event Reporting System (FAERS) [16] is an alternative for the discovery of DDIs [1, 17]. In fact, Tatonetti et al. recently provided an important source of DDI candidates, the TWOSIDES database [18], through mining FAERS. However, analysis of pharmacovigilance data is still very challenging and rampant confounding leads to high false positive rates. Alternatively, cheminformatic methods can be applied to rank the DDI candidates extracted from a pharmacovigilance study. These methods offer the possibility to study the final candidates from the point of view of the molecular structure, pharmacological action or adverse effects comparison. Similarity-based methods were useful to rank drug candidates extracted from pharmacovigilance data mining that produce some adverse events, such as rhabdomyolysis and pancreatitis [19, 20].

In this paper, we systematically apply six different similarity-based techniques to evaluate drug interaction hypotheses mined from pharmacovigilance data. The objective of the current study is to improve the detection of DDIs in the TWOSIDES database using methodologies we recently developed based on the application of similarity-based modeling (see Fig 1). When applied to the TWOSIDES database a reference standard of DDIs that produce arrhythmia, we measured: 1) enrichment factor provided by TWOSIDES, and 2) performance when we rank the set of DDI candidates using proportional reporting ratio (PRR), p-values, and different similarity-based models. As is demonstrated by our results, the implementation of cheminformatic models in pharmacovigilance data is useful in DDI signal detection and decision making process.

Fig 1. Flowchart with the different steps implicated in the study.


DDI reference standard

We collected a reference standard with 149 DDIs present in the intersection of both DrugBank [21] and Veterans Association Hospital database [22]. The collected DDIs produced the effect of arrhythmias and related terms, such as QT prolongation or increased heart rhythm. In our reference standard there are DDIs with different levels of documentation, from “well established through controlled studies” to “theoretical interactions but pharmacological reasons lead clinicians to recognize the possible interaction”. The 149 DDI pairs comprised 162 drugs and were included in a 162×162 drug-drug matrix called M1 (13,041 total number of possible interactions). We codified the 149 reference standard DDIs in M1 with value 1 in each respective cell, and the non-DDIs with value 0 (see S1 and S2 Tables).

Drug similarity-based calculation

To calculate drug similarity we used different measures. Fig 2 shows the workflow used to calculate some similarity measures. A detailed explanation about the construction of drug similarity-based models can be found in previous publications [11, 12]. Different drug similarity matrices (M2) were generated at this step (the data is provided in S3 Table).

Fig 2. Flowchart including the steps implicated in the calculation of different similarity measures.

Drugs were represented as fingerprints, i.e. bit vector codifying the presence or absence (1, 0) of structural keys, adverse effects, targets, drug-drug interactions or ATC codes. The Tanimoto coefficient (Tc) between all the fingerprint pairs is calculated and placed in a drug-drug similarity matrix (M2). Different M2 matrices are calculated weighted with the different similarity measures.

2D molecular structure drug similarity (matrix M2a).

We calculated MACCS fingerprints for all the 162 drugs in our reference standard. MACCS represents the 2D molecular structure as a vector that codifies the presence or absence (1 and 0 codes) of different structural keys or sub-fragments. A detailed description of the fingerprint calculation can be found in previous publications [11, 23]. We compared pairs of MACCS fingerprints using the Tanimoto coefficient (Tc). The Tc is the ratio between the number of features (structural keys in our case) in the intersection and the union of two fingerprints. The Tc ranges from 0 to 1, which means minimum and maximum similarity respectively. Once we calculated the Tc for all the drug pairs, we constructed a 162×162 drug similarity matrix (M2a). In each cell of the matrix we placed the Tc for the drug pair (see Fig 2).

3D molecular structure drug similarity (matrix M2b).

We downloaded from DrugBank [21] the isomeric SMILES codes of the 162 drugs in our reference standard. Isomeric SMILES codes provide information about the chemical structure but also allow the specification of the configuration of chiral centers. We pre-processed the database using the LigPrep module in the Schrödinger 2011 package [24]. Through this process, when there are non-specified chiral centers in some drugs, a maximum of three enantiomers was generated. We performed Monte Carlo Multiple Minimum (MCMM) conformational analysis calculations using Macromodel [24] to determine the most stable 3D molecular structure for each drug. We retained the structure with the minimum potential energy OPLS_2005 as a drug-template for the next shape screening step. Using these 3D drug structure templates generated through MCMM, we performed shape screening calculations with Phase module [24] to identify similar molecules to the templates. The calculation performed a flexible alignment between the 3D conformations of drug i with the rigid 3D structure template of drug j and identified similarities between pair of drugs based on similar 3D distribution of pharmacophoric features. We calculated a 3D similarity score (Phase Sim property) that ranges from 0 to 1 indicating minimum and maximum similarity respectively. 3D scores between all the pairs (162×162) were integrated in the 3D similarity matrix M2b. A more detailed explanation about 3D calculation parameters can be found in previous references [24, 25].

Adverse drug effect profile fingerprint (ADEPF) similarity (matrix M2c).

Adverse effects were collected from SIDER database [26], an open resource of drugs and related side effects extracted from public documentation and package inserts. The adverse effects for each drug were represented as fingerprints, i.e. bit vector codifying the presence or absence (1, 0) of adverse effects. As explained previously in the study (see 2D molecular structure drug similarity section), we calculated the Tc between all the fingerprint pairs and constructed the matrix M2c with ADE similarity information between all the drugs (see Fig 2).

Target profile fingerprint (TPF) similarity (matrix M2d).

We collected the targets for each drug using DrugBank [21]. We integrated the datasets with information about targets, enzymes, transporters and carriers. The same target protein but from different organisms was considered as a unique case. As we explained previously, we represented targets in each position of a fingerprint and then we calculated the Tc between all the fingerprint pairs. In the final step, we constructed the matrix M2d weighted with target information including in each cell the Tc between the corresponding drug pair.

Drug-drug interaction profile fingerprint (DDIPF) similarity (matrix M2e).

The concept of drug-drug interaction profile fingerprints was introduced in a previous study [12]. Each drug was represented as a vector that codifies the presence (code 1) or the absence (code 0) of the different drug-drug interactions, i.e., in our case we constructed DDIPFs with drug interaction information from DrugBank [21]. Tc comparing the DDIPFs was included in the matrix M2e (see Fig 2).

ATC-codes fingerprint similarity (matrix M2f).

We used the Anatomical Therapeutic Chemical (ATC) Classification System [27] to calculate similarities between drugs. We considered four levels in the ATC codes, involving information in different categories: location (organ or system), therapeutic, pharmacological, and chemical properties. The different groups in each level were represented as vector positions and Tc was calculated between all the ATC-code fingerprint pairs. As previously, we constructed the matrix M2f with ATC-codes similarity.

Calculation of DDI candidates

The method to generate the new set of DDI candidates has been recently described by our research group [11, 12, 25, 28]. Through this step a new DDI matrix (M3) is calculated with the DDI score for each pair of drugs in each respective cell. It is worth noting that diagonal values in the initial matrices M2 are set 0 not representing similarity of a drug with itself. The final DDI score provided by M3 is based on a leave-one-out process. To generate the final matrix M3 with all the drug pairs DDI candidates we multiplied M1 by M2 retaining only in each cell the highest value in the addition-array. Although in each cell all the scores against the set of reference standard DDIs are generated (the matrix product generated in each cell the addition of the different scores), only the highest score is retained to represent the maximum similarity against the well-known DDIs. The resultant matrix is not symmetric (similarity is implemented in both branches of the drug-drug pair), for which a symmetric transformation is carried out retaining the maximum value in each symmetric cell. That way, each cell in the final M3 matrix represents the drug pair DDI candidate with the maximum similarity score regarding to a DDI drug pair deemed as true positive in our reference standard. DDIs from the M3 matrix are listed with their corresponding similarity scores (these data is provided in S4 Table). DDIs belonging to the matrix diagonal and representing drugs interacting with themselves are eliminated. Although our models are based on the maximum similarity score, the method allows the implementation of alternative algorithms.

Pharmacovigilance data: TWOSIDES database

We downloaded the TWOSIDES database [18], a data source of DDIs extracted from mining FAERS [16]. We collected 13,105 DDIs related to the terms arrhythmia and bradyarrhythmia with proportional reporting ratio (PRR)>1 and p-value <.05. These data were mapped to our initial DDI reference standard to find the DDIs in common. We retained 386 DDIs present in both databases: 14 positives and 372 negatives (see S5 Table). The subset of final DDIs was sorted by PRR and p-value (provided by TWOSIDES) and by the different similarity-based models.

Combination of similarity-based modeling

We constructed different complex models combining the M3 similarity-based scorings for the 386 cases analyzed in TWOSIDES. We used Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Through PCA the six M3 scorings (2D, 3D, ADEPF, TPF, DDIPF and ATC) were transformed into a unique component explaining the 66.4% of the variance (Factor loadings were: 2D MACCS = -0.87, 3D = -0.85, ADEPF = -0.85, TPF = -0.74, DDIPF = -0.84, ATC = -0.73). The percentage of the variance explained by each additional factor is provided in S1 Fig On the other hand, we trained a LDA model with 14 positives and 372 negative cases. Five variables were introduced in the model using the forward-stepwise method: 2D, 3D, TPF, DDIPF and ATC scores. Statistical quality of the model was assessed through parameters such as Wilks’ statistic (U = 0.84), Fisher ratio (F (5, 380) = 14.8) and the significance level (p<.0001). S1 Fig also provides the AUROC results of LDA including from 1 to 5 variables in the model.

Assessment of the performances

We measured the enrichment factor (EF) detected in TWOSIDES as the ratio between the prevalence detected in TWOSIDES and the prevalence in the initial reference standard. Prevalence is defined as the proportion of known well-established DDIs between all the DDI candidate cases. We also ranked the DDIs according to proportional reporting ratio (PRR) and p-value, provided by the TWOSIDES, and according to our different similarity DDI models and assessed the precision in different top positions. Precision or positive predictive value was calculated as the ratio between true positives and all positive cases, true positives plus false positives (Precision = TP/TP+FP). For the comparison of the performances we also used areas under the receiver operating characteristic curves (in the manuscript this partial AUROC for the subset of TWOSIDES DDI candidates is defined as pAUROC). If the area under the curve is 0.5 the classifier is random whereas a perfect classifier will yield an area of 1. ROC curves were also plotted showing the true positive fraction (sensitivity) against the false positive fraction (1-specificity). We performed an external evaluation using reference standard data sources, such as Drugdex (Micromedex) [29] and [30], to deem the rest of 372 candidates as positive and negative cases.


Performance in TWOSIDES using the initial reference standard

We mapped our initial DDI reference standard (149 positive and 12,892 negative cases) to the arrhythmia DDI candidates extracted from TWOSIDES database to find DDIs in common. From TWOSIDES we identified 14 positive cases and 372 negatives, a 3-fold enrichment factor (p = .0003) (see Fig 1). We ranked the subset of DDI candidates obtaining an area under the ROC curve of 0.62 and 0.67 using proportional reporting ratio (PRR) and p-values as scorings (we defined this partial AUROC as pAUROC; see Fig 3).

Fig 3. ROC results using different methods to rank the 386 TWOSIDES candidates: PRR (Proportional Reporting Ratio), p-values, 2D structural similarity (MACCS), 3D structural similarity, ADEPF (Adverse Drug Effect Profile Fingerprint), TPF (Target Profile Fingerprint), DDIPF (Drug-Drug Interaction Profile Fingerprint), ATC-code fingerprint, PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis).

Panel (a) shows pAUROCs with 95% confidence intervals. Panels (b) and (c) show the ROC curves for the different methods.

Application of similarity-based modeling in DDI signals ranking

As an alternative system to PRR and p-values, we ranked the subset of 386 DDI candidates extracted from TWOSIDES using the different similarity-based models. Fig 3 shows the pAUROC results and ROC curves for the different ranking methods used to sort the candidates, including PRR, p-values, (data shown previously) and all the similarity-based models using 2D and 3D similarity, ADEPF, TPF, DDIPF and ADE-codes similarities (pAUROC values, 95% confidence intervals and significance statistics are shown in S6 Table). The score provided by the models for each DDI is based on a leave-one-out procedure. Application of similarity models offered better results in the ranking process.

We constructed more complex models combining the different individual similarity scores through unsupervised methods, such as Principal Component Analysis (PCA), and supervised methods, such as Linear Discriminant Analysis (LDA) (see Fig 3). In PCA we combined the six similarity scorings (2D, 3D, ADEPF, TPF, DDIPF and ATC) in a simple component-scoring. Using LDA, five individual scores (2D, 3D, TPF, DDIPF and ATC) were introduced in the final model. Detailed description of the parameters for PCA and LDA are provided in the Methods section. The PCA and LDA models showed pAUROCs of 0.87 an 0.90 respectively.

Assessing performance using alternative reference standards

In the set of 386 TWOSIDES DDI candidates, 372 were considered negative cases or non-DDIs according to our initial reference standard. However, as our reference standard may be incomplete, we assessed the detection of DDIs against two additional reference sources: Drugdex (Micromedex) [29] and [30].

Assessment of DDIs using Drugdex.

We labeled the remaining 372 DDIs as true positives (TP) or false positives (FP) whether the interactions causing arrhythmias are described in Drugdex (Micromedex) or not. We considered different levels of knowledge to deem the interactions as positives: (Set 1) well-established interactions, probable, and theoretical are included (we found 164 TP and 208 FP); (Set 2) well-established interactions and probable are included (49 TP and 323 FP); (Set 3) only considered well-established interactions (10 TP and 362 FP). The pAUROC results are shown in Table 1. Our individual six similarity models, along with PCA and LDA models, performed well in set 1 whereas in set 2 and 3 the performance was poor. Only ATC-codes and LDA model showed some predictive power in set 3.

Table 1. pAUROCs using different methods to rank the DDIs.

Drugdex (sets 1–3) and (sets 4–5) were used as reference standards.

Assessment of DDIs using

The same 372 DDIs were analyzed using as a reference standard. Two sets of DDIs were considered depending on the clinical significance: set 4 including interactions with high and moderate clinical significance and set 5 with only highly clinically significant interactions as TP. The pAUROC results have shown that our similarity-based models performed better than PRR and p-values in the mentioned sets (Table 1). We also calculated precision in different top positions for all the methods including all the previous test sets (see S7 Table). Figs 4 and 5 show the precision versus the ranking for test sets 1 and 4, using all the levels of DDI knowledge in Drugdex and Precision is improved when we used the similarity-based models to rank the DDIs.

Fig 4. Precision of the different methods in test set 1 with all the interactions described in the reference standard Drugdex (interactions well_established+probable+theoretical).

Fig 5. Precision of the different methods in test set 4 with all the interactions described in the reference standard (high and moderate clinically significant interactions).


The objective of our current study is to show the ability of cheminformatics to improve the analysis of DDI data extracted from a pharmacovigilance study. We applied different similarity-based models to improve DDI detection in the TWOSIDES database [18], an important source of DDI candidates extracted from FAERS [16]. Similarity models can be applied to other types of pharmacovigilance data, such as Electronic Medical Records, or claim databases [31]. These methods not only offer a possibility to improve the precision and hence, the detection of DDIs, but also provide additional information very useful in decision making. As an example, Table 2 shows some interactions detected by the different similarity models. For each DDI candidate, the method isolates the most similar drug pair in the reference standard, for which interaction information is available in the literature. This fact could be valuable for researchers to make decisions about the importance of the candidate, novelty or possible mechanism of action by which the drugs interact and cause the adverse effect. The DDI models can point out new DDI candidates based on the comparison of drugs that belong to the same pharmacological class. As an example in Table 2, the model generated the DDI candidate clarithromycin-verapamil because there is a similar interaction in our reference standard, the combination erythromycin-verapamil. In this case, both clarithromycin and erythromycin are macrolide antibiotics that belong to the same pharmacological class and can inhibit the CYP3A-mediated verapamil metabolism and increase verapamil exposure. However, the models are also able to detect some new candidates through the comparison of drugs in different classes. This is the case of some examples in Table 2 and Fig 6, such as the interaction between methadone and fluconazole that it is generated from the interaction amitriptyline-fluconazole. The model detected that the 3D structure of methadone, used in the treatment of opioid dependency and chronic pain, was similar to the tricyclic antidepressant amitriptyline (3D_score = 0.82). In both cases, fluconazole can decrease the CYP3A4 metabolism of amitriptyline and methadone and increase the serum concentration with a higher risk of causing drugs-related adverse effects, such as arrhythmias or QT interval prolongation [29]. Amitriptyline was also predicted by the 3D model to interact with gatifloxacin, an antibiotic of the fluoroquinolone family. The interaction was confirmed in Drugdex [29]. The model generated the candidate because amitriptyline was similar to the antiarrhythmic drug disopyramide (3D score = 0.80) (see Fig 6) and the interaction disopyramide and gatifloxacin was present in our reference standard. The probable mechanism of the interaction in both cases is due to additive effects on QT interval. A likely molecular mechanism of the drugs-QT prolongation is the blockade of the HERG potassium channel [32]. The selective serotonin reuptake inhibitor (SSRI) citalopram, was also found to be similar to disopyramide (3D score = 0.80) and hence, to interact with ranolazine. The combination disopyramide-ranolazine is associated with the risk of possible additive effects on QT prolongation. The same mechanism is predicted by the 3D model for the candidate citalopram-ranolazine and confirmed in Drugdex [29]. Another example described in our reference standard is the concomitant use of imipramine and fluconazole, associated with higher risk of QT prolongation due to possible alterations in imipramine metabolism. The Target model predicts the interaction between imipramine and diltiazem with the same mechanism associated. The probable mechanism described in Drugdex is in agreement and based on decreased imipramine clearance. Although in not all the cases the information about the adverse effect and mechanisms associated from the original DDI in the reference standard to the new candidate is correct, in many cases this information is valuable to assess the etiology and the importance of the DDI candidate.

Table 2. Example of some arrhythmia DDIs described in Drugdex and detected by the different similarity-based models (2D MACCS, 3D similarity, ADEPF, TPF, DDIPF and ATC)


Fig 6. Examples of different pairs of similar drugs with different pharmacological profile detected by our models.

Panel (a): methadone is similar to amitriptyline and predicted to interact with fluconazole (reference standard amitriptyline-fluconazole). Panel (b): amitriptyline is similar to disopyramide and predicted by the 3D model to interact with gatifloxacin (reference standard disopyramide—gatifloxacin). Panel (c): citalopram, was found to be similar to disopyramide and hence, to interact with ranolazine (reference standard disopyramide-ranolazine). Panel (d): diltiazem was found to be similar to fluconazole and predicted to interact with imipramine (reference standard fluconazole-imipramine).

As we have shown previously, information provided by the different similarity scores can be implemented in the development of more complex models (PCA and LDA). Although the information is complementary, the different scoring measures showed some correlation. Table 3 shows the correlation coefficients between the six similarity measures implemented in this study.

Table 3. Correlation coefficients (r) between the six drug similarity measures.

Some knowledge measures, such as TPF and DDIPF, were more related to the therapeutic/pharmacological/chemical category defined in the ATC similarity (r = .46 and r = 0.41). This type of similarity measures are highly dependent on the information provided by knowledge databases that can be bias towards the pharmacological category. However, as shown in Fig 6, the similarity measures have the ability to capture intra-class and inter-class similarity. We calculated the number of drug pairs retrieved by the similarity scores in the different top percentile positions in a range of ATC classification, from zero (drugs in different class) to four ATC levels in common (drugs in the same class). All the measures showed a good recovery of the drug pairs in the same pharmacological class (4 coincident ATC levels). However, high similarity was also detected between some pairs of drugs with a totally different ATC classification (no ATC coincident levels). S8 Table shows the results of the analysis. In this article, similarity was integrated comparing drugs. However, additional similarity metrics could be added comparing adverse effects or adverse reactions caused by drugs combinations as a useful and alternative system to develop this type of predictor.

The test using Drugdex as a reference standard showed poor performance in sets 2 and 3, where only probable and well-established DDIs were considered positives. The similarity measures, capturing chemical and pharmacological features can detect with better precision DDIs deemed as theoretical by Drugdex (see set 1 with 115 theoretical, 39 probable and 10 well-established DDIs). The predictors are still useful pointing out possible dangerous drug combinations associated with severe outcomes. The test based on showed that the similarity models performed better than PRR and p-value scorings. In this set we used a DDI system classification based on clinical significance, related to the severity of the possible adverse events produced by the interactions.

Drug phenotypic, therapeutic, structural and genomic similarity modeling have also been applied to predict DDIs based on machine learning methods [33]. On the other hand, different types of similarity models were also previously published by our research group to predict different adverse effects [34] or new potential DDIs of different etiology [28]. In this study we showed the applicability of the similarity based models to improve the detection of DDIs that cause arrhythmias in pharmacovigilance data. Combining pharmacovigilance data with similarity modeling showed potential to facilitate the detection of new DDIs. In our study we integrated the data through an straightforward and simple approach that allows to obtain good performance values but also assists the researcher in the decision making process. The method allows the calculation and evaluation of new drugs in an external test set. Drugs in the test can be added to the matrix M2 providing similarity information between drugs in the test and drugs in the reference standard. The method will generate for the new drug-drug candidates in the test a score based on the maximum similarity against the set of DDIs in the reference standard. However, a limitation is that our method only predicts interactions between our 162 reference standard drugs and drugs in the test. No DDIs can be generated when both drugs implicated in the interaction are different from our 162 reference standard drugs. This fact limits the applicability of the developed models.

We applied similarity-based modeling to the DDI signals detected in FAERS when PRR>1 and p<.05. However, application of similarity modeling to all the DDIs included in the pharmacovigilance data could be an option to retrieve some interactions not detected by the data mining algorithm. This type of models could be implemented in the early detection of DDIs related to drugs newly introduced in the market and with not enough exposure in the population.


In this study, we applied similarity-based modeling to the candidates selected through pharmacovigilance data mining of DDIs that can cause the ADE arrhythmia. When ranking the subset of DDI candidates, similarity-based modeling showed better performance than the parameters obtained in the pharmacovigilance data mining, such as PRR and p-values. The implementation of similarity-based modeling in pharmacovigilance improved precision of the final method and provided a mechanism for decision making. Our method is a useful tool as a pharmacovigilance resource that can help in the decision support of new DDIs.

Supporting Information

S1 Fig. Variance explained by each additional factor included in the PCA and AUROC results of LDA including from 1 to 5 variables in the model.


S1 Table. Arrhythmia DDI reference standard (matrix M1) with 149 DDIs.


S2 Table. Molecular structures (SMILES codes) for the drugs included in the study.


S3 Table. Matrices M2 generated with the different similarity measures.


S4 Table. List of DDIs generated in the matrices M3.


S5 Table. List of 386 DDIs found in the intersection of TWOSIDES with the initial reference standard.


S6 Table. pAUROCs using different methods to rank the 386 TWOSIDES candidates.


S7 Table. Precision in different top positions ranking the TWOSIDES arrhythmia DDI candidates with different scoring methods (reference standard: Drugdex and


S8 Table. Number of drug pairs retrieved by the similarity scores in the different top percentile positions in a range of ATC classification, from zero (drugs in different class) to four ATC levels in common (drugs in the same class).


Author Contributions

Conceived and designed the experiments: SV GH NPT. Performed the experiments: SV. Analyzed the data: SV. Contributed reagents/materials/analysis tools: SV TL GH NPT. Wrote the paper: SV GH NPT.


  1. 1. Tatonetti NP, Fernald GH, Altman RB. A novel signal detection algorithm for identifying hidden drug-drug interactions in adverse event reports. J Am Med Inform Assoc. 2012;19(1):79–85. pmid:21676938
  2. 2. Pirmohamed M. Ml O: Drug Interactions of Clinical Importance. London: Chapman & Hall; 1998.
  3. 3. Percha B, Altman RB. Informatics confronts drug-drug interactions. Trends Pharmacol Sci. 2013;34(3):178–84. pmid:23414686
  4. 4. Rice PJ, Perry RJ, Afzal Z, Stockley IH. Antibacterial prescribing and warfarin: a review. Br Dent J. 2003;194(8):411–5. pmid:12778089
  5. 5. Krishna G, Ma L, Prasad P, Moton A, Martinho M, O'Mara E. Effect of posaconazole on the pharmacokinetics of simvastatin and midazolam in healthy volunteers. Expert Opin Drug Metab Toxicol. 2012;8(1):1–10. pmid:22176629
  6. 6. Korhonen LE, Rahnasto M, Mahonen NJ, Wittekindt C, Poso A, Juvonen RO, et al. Predictive three-dimensional quantitative structure-activity relationship of cytochrome P450 1A2 inhibitors. J Med Chem. 2005;48(11):3808–15. pmid:15916432
  7. 7. Rahnasto M, Raunio H, Poso A, Wittekindt C, Juvonen RO. Quantitative structure-activity relationship analysis of inhibitors of the nicotine metabolizing CYP2A6 enzyme. J Med Chem. 2005;48(2):440–9. pmid:15658857
  8. 8. Yap CW, Chen YZ. Prediction of cytochrome p450 3A4, 2D6, and 2C9 inhibitors and substrates by using support vector machines. J Chem Inf Model. 2005;45(4):982–92. pmid:16045292
  9. 9. Afzelius L, Zamora I, Ridderstrom M, Andersson TB, Karlen A, Masimirembwa CM. Competitive CYP2C9 inhibitors: Enzyme inhibition studies, protein homology modeling, and three-dimensional quantitative structure-activity relationship analysis. Mol Pharmacol. 2001;59(4):909–19. pmid:11259637
  10. 10. De Rienzo F, Fanelli F, Menziani MC, De Benedetti PG. Theoretical investigation of substrate specificity for cytochromes p450 IA2, p450 IID6 and p450 IIIA4. J Comput Aided Mol Des. 2000;14(1):93–116. pmid:10702928
  11. 11. Vilar S, Harpaz R, Uriarte E, Santana L, Rabadan R, Friedman C. Drug-drug interaction through molecular structure similarity analysis. J Am Med Inform Assoc. 2012;19(6):1066–74. pmid:22647690
  12. 12. Vilar S, Uriarte E, Santana L, Tatonetti NP, Friedman C. Detection of drug-drug interactions by modeling interaction profile fingerprints. PLOS ONE. 2013;8(3):e58321. pmid:23520498
  13. 13. Gottlieb A, Stein GY, Oron Y, Ruppin E, Sharan R. INDI: a computational framework for inferring drug interactions and their associated recommendations. Mol Syst Biol. 2012;8:592. pmid:22806140
  14. 14. Tari L, Anwar S, Liang S, Cai J, Baral C. Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism. Bioinformatics. 2010;26(18):i547–i53. pmid:20823320
  15. 15. Percha B, Garten Y, Altman RB. Discovery and explanation of drug-drug interactions via text mining. Pac Symp Biocomput. 2012 2012:410–21.
  16. 16. FDA U.S. Food and Drug Administration. FDA Adverse Event Reporting System (FAERS). Available: Accessed 01 Apr 2014.
  17. 17. Harpaz R, Chase HS, Friedman C. Mining multi-item drug adverse effect associations in spontaneous reporting systems. BMC Bioinformatics. 2010;11(Suppl 9):S7. pmid:21044365
  18. 18. Tatonetti NP, Ye PP, Daneshjou R, Altman RB. Data-Driven Prediction of Drug Effects and Interactions. Sci Transl Med. 2012;4(125): 125ra31. pmid:22422992
  19. 19. Vilar S, Harpaz R, Chase HS, Costanzi S, Rabadan R, Friedman C. Facilitating adverse drug event detection in pharmacovigilance databases using molecular structure similarity: application to rhabdomyolysis. J Am Med Inf Assoc. 2011;18:I73–I80. pmid:21946238
  20. 20. Vilar S, Harpaz R, Santana L, Uriarte E, Friedman C. Enhancing adverse drug event detection in electronic health records using molecular structure similarity: application to pancreatitis. PLoS One. 2012;7(7):e41471. pmid:22911794
  21. 21. DrugBank database, version 3.0. Accessed: Accessed 15 Apr 2011.
  22. 22. Olvey EL, Clauschee S, Malone DC. Comparison of Critical Drug-Drug Interaction Listings: The Department of Veterans Affairs Medical System and Standard Reference Compendia. Clin Pharmacol Ther. 2010;87(1):48–51. pmid:19890252
  23. 23. Durant JL, Leland BA, Henry DR, Nourse JG. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci. 2002;42(6):1273–80. pmid:12444722
  24. 24. Schrödinger package, version 9.2, Schrödinger, LLC, New York, USA, 2011. Available: (Accessed 15 Oct 2013).
  25. 25. Vilar S, Uriarte E, Santana L, Friedman C, Tatonetti NP. State of the art and development of a drug-drug interaction large scale predictor based on 3D pharmacophoric similarity. Curr Drug Metab. 2014;15(5):490–501. pmid:25431152
  26. 26. SIDER Side Effect Resource. Accessed: Accessed 01 May 2013.
  27. 27. WHO Collaborating Centre for Drug Statistics Methodology (WHOCC). Available: Accessed 01 Jun 2014.
  28. 28. Vilar S, Uriarte E, Santana L, Lorberbaum T, Hripcsak G, Friedman C, et al. Similarity-based modeling in large-scale prediction of drug-drug interactions. Nat Protoc. 2014;9(9):2147–2163. pmid:25122524
  29. 29. Drugdex System (Micromedex Healthcare Series). Greenwood Village, CO: Thomson Reuters (Healthcare) Inc., 2011. Available: Accessed 01 Mar 2014.
  30. 30. Drug Information Online, database. Available: (Accessed 01 Mar 2014).
  31. 31. The Truven MarketScan Commercial Claims and Encounters (CCAE). Available: Accessed 01 Jun 2014.
  32. 32. Crouch MA, Limon L, Cassano AT. Clinical relevance and management of drug-related QT interval prolongation. Pharmacotherapy 2003;23(7):881–908. pmid:12885102
  33. 33. Cheng F, Zhao Z. Machine learning-based prediction of drug-drug interactions by integrating drug phenotypic, therapeutic, chemical, and genomic properties. J Am Med Inform Assoc. 2014;21(e2):e278–86. pmid:24644270
  34. 34. Vilar S, Ryan PB, Madigan D, Stang PE, Schuemie MJ, Friedman C, et al. Similarity-based modeling applied to signal detection in pharmacovigilance. CPT Pharmacometrics Syst Pharmacol. 2014;3:e137. pmid:25250527