Skip to main content
  • Loading metrics

PathFX provides mechanistic insights into drug efficacy and safety for regulatory review and therapeutic development

  • Jennifer L. Wilson,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Department of Bioengineering, Stanford University, Palo Alto California, United States of America

  • Rebecca Racz,

    Roles Data curation, Formal analysis, Writing – review & editing

    Affiliation Division of Applied Regulatory Science, US Food and Drug Administration, Silver Spring Maryland, United States of America

  • Tianyun Liu,

    Roles Data curation, Formal analysis, Writing – review & editing

    Affiliation Department of Bioengineering, Stanford University, Palo Alto California, United States of America

  • Oluseyi Adeniyi,

    Roles Conceptualization, Writing – review & editing

    Affiliation Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring Maryland, United States of America

  • Jielin Sun,

    Roles Conceptualization, Writing – review & editing

    Affiliation Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring Maryland, United States of America

  • Anuradha Ramamoorthy,

    Roles Conceptualization, Writing – review & editing

    Affiliation Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring Maryland, United States of America

  • Michael Pacanowski,

    Roles Conceptualization, Writing – review & editing

    Affiliation Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring Maryland, United States of America

  • Russ Altman

    Roles Conceptualization, Funding acquisition, Project administration, Resources, Writing – review & editing

    Affiliations Department of Bioengineering, Stanford University, Palo Alto California, United States of America, Department of Genetics, Stanford University, Palo Alto California, United States of America


Failure to demonstrate efficacy and safety issues are important reasons that drugs do not reach the market. An incomplete understanding of how drugs exert their effects hinders regulatory and pharmaceutical industry projections of a drug’s benefits and risks. Signaling pathways mediate drug response and while many signaling molecules have been characterized for their contribution to disease or their role in drug side effects, our knowledge of these pathways is incomplete. To better understand all signaling molecules involved in drug response and the phenotype associations of these molecules, we created a novel method, PathFX, a non-commercial entity, to identify these pathways and drug-related phenotypes. We benchmarked PathFX by identifying drugs’ marketed disease indications and reported a sensitivity of 41%, a 2.7-fold improvement over similar approaches. We then used PathFX to strengthen signals for drug-adverse event pairs occurring in the FDA Adverse Event Reporting System (FAERS) and also identified opportunities for drug repurposing for new diseases based on interaction paths that associated a marketed drug to that disease. By discovering molecular interaction pathways, PathFX improved our understanding of drug associations to safety and efficacy phenotypes. The algorithm may provide a new means to improve regulatory and therapeutic development decisions.

Author summary

Many drugs fail to reach the market because they are not sufficiently efficacious for their disease indication or they cause intolerable side-effects. To understand drug efficacy and safety, we created an algorithm, PathFX. The algorithm identified relationships between drugs and diseases, and drugs and side-effects. We tested PathFX’s ability to identify the disease for which the drug was developed. We applied PathFX to post-marketing reports of drug side effects and identified drug side effects where regulatory review was ambiguous. Finally, we identified novel diseases for which marketed drugs could treat. The method has the potential to be a tool for assessing drug safety and efficacy during development and may have utility for regulators and industry scientists.


The drug discovery process is long, difficult, and expensive. Only ~10% of drugs entered in human studies make it to the market[1] because many drugs have insufficient efficacy[2] or significant safety issues[3,4]. The lack of efficacy may be related to poor bioavailability, incomplete inhibition of the target, or selection of a target that is not a central driver of the disease. Adverse events can occur through primary effects of a drug on the intended targets or respective biological pathways, or secondary effects that can occur with off-target binding [3,5]. For these reasons, understanding a drug’s phenome–the collection of clinical characteristics that are related to a drug target or pathway–is integral to validating and prioritizing drug targets for development, and identifying other potential (adverse) drug effects that may occur by perturbing a particular biological network. However, the available tools for characterizing drug pathways during target optimization and regulatory review catalogue varied aspects of the cellular response to drug and integration across these resources is often incomplete.

Many data sources containing relevant information are available to characterize pharmacological drug effects. However, the vast amount of information is siloed into separate databases, creating a patchwork of information that is difficult and tedious to easily integrate for regulatory review. To illustrate, in evaluating a drug that inhibits a specific enzyme, one might look to Mendelian disease as an indicator of what clinical manifestations are related to loss of that enzyme’s function; this could inform what toxicities or benefits might arise from pharmacologically interrupting the enzyme’s function (e.g., as for the case of PCSK9). This exercise can then be repeated by evaluating the genetic epidemiology of more common variations in the enzyme, and so on. Relevant resources that could collectively inform a drug target’s phenome include Pharmacogenomics Knowledgebase (PharmGKB)[6], genome-wide association studies (GWAS)[7], Online Mendelian Inheritance in Man (OMIM)[8], disease-gene associations such as DisGeNet[9,10], and phenotype-gene association studies (PheWAS)[11,12]. In addition, focusing on a single gene or protein does not always provide a complete view of the biological milieu, or full pathway context, relevant to a drug target. Protein interaction databases, such as STRING[13,14] and iRefWeb[15], relate drug targets to signaling intermediates to provide context for single gene effects, though, these are not easily linked to phenotype information.

Network methods can be useful for identifying mechanistic interactions that relate drug targets to adverse reactions, and are under-utilized in understanding drug adverse effects[5]. A network approach uncovered interaction intermediates between drug targets associated with peripheral neuropathy[16], drug-induced rhabdomyolysis[17], drug-induced severe cutaneous stevens-johnson syndrome[18], drug-induced lung injury[19], and drug-induced contraction-related cardiotoxicity[20]. A further meta-analysis of networks for these toxicities discovered protein mediators that are common among pairs of toxicities[20]; for instance, they discovered that drugs associated with peripheral neuropathy and drugs associated with Stevens-Johnson syndrome had nine protein targets in common. Another study merged protein-protein interactions, gene-to-adverse events (AEs) associations, and knowledge of drug-protein targets to train a random forest model that identified drugs with the greatest connectivity to AEs[21]. Their analysis showed improved prediction of AEs when combining their approach, SubNet, with medication-wide association studies (MWAS) assessing genes associated with four AEs[21]. Another network based approach used a shortest-path technique for in silico predictions for drug repurposing[22]. A network propagation technique created drug pathways and collapsed these pathways into phenotype vectors, though, they did not use this paradigm to be predictive of drug safety and efficacy[23]. These foundational studies demonstrated that interaction networks are a rich source of pathway information that can identify molecular mechanisms for drug safety and efficacy.

Here we constructed drug pathways using protein-protein interactions, and we annotated these pathways with the phenotypes–diseases and off-target effects–associated with the pathway genes using a novel algorithm–PathFX. We demonstrated the utility of PathFX by creating pathways for marketed drugs and identified interaction paths from the drug’s target(s) to genes associated with the marketed indication of the drug. We benchmarked PathFX’s performance using a published set of marketed drugs and quantified our ability to relate a drug to its disease indication. We applied the algorithm to two tasks. First, we strengthened adverse event signals in the FDA Adverse Event Reporting System (FAERS) by searching for drug pathways containing an association to a reported adverse event. Second, we identified repurposing opportunities for marketed drugs and tested these identifications using existing off-label drug use and clinical trial data. We created a tool for better understanding drug safety and efficacy and PathFX may have the potential to aid in regulatory review and therapeutic development decisions.


A tissue non-specific interaction network and the PathFX formalism

Recent work in identifying a drug’s marketed disease, or indication, from protein interactions found that this identification was maximized by considering protein interactions that are in close proximity to a drug’s target [22,23,24]. Thus, we hypothesized that protein-protein interactions that are proximal to a drug’s target(s) could provide insight into mechanisms of drug safety and efficacy.

To create drug interaction pathways, we pulled interaction data from iRefWeb[15], Reactome[25], PharmGKB[6], and a curated set of predicted drug-protein binding data (see Methods section). We merged and scored (explained in methods) these data to yield an interaction network of 25,604 nodes and 318,644 edges. The number of interactions and interaction score distributions are in S1 Fig.

Our algorithm, PathFX, selects a drug target’s most relevant interaction edges (local interaction neighborhoods), merges neighborhood networks from all drug targets, and then identifies enriched phenotypes–which could represent either safety or efficacy phenotypes–in the interaction neighborhood (Fig 1, and usage summary in S4 Fig). In this context, safety phenotypes included associations such as adverse events or side effects (e.g. “pancreatitis”, “adverse weight gain”), and efficacy phenotypes included disease associations (e.g. “diabetes”, “major depressive disorder”); some phenotypes (e.g. “hypertension”) could belong to both of these groups. To identify which phenotypes are associated with the drug target network, we merged data from multiple sources: DisGeNet[9,10], Phenotype Genotype Integrator (PheGenI)[26], ClinVar [27], OMIM [8], and PheWas [11,12]. In this process we controlled for multiple biases as follows:

  1. We reduced the inclusion of high-degree, and highly-studied hub proteins (e.g. P53, or ubiquitin) in all networks (further explained in methods). We called this process interaction specificity analysis.
  2. As previous research suggested that disease modules with fewer than 25 genes were too fragmented for interactome analysis[28], we removed phenotypes associated with fewer than 25 total genes.
  3. We retained phenotypes in drug pathways of interest only if the phenotype’s p-value was more significant than would be expected by chance based on the structure of our network.
Fig 1. The PathFX formalism for defining drug pathways.

We first assemble interaction data (top; step 0), and then applied a depth-first algorithm to define relevant protein interaction pathways (middle; step 1). We benchmarked these pathways using interaction specificity analysis (bottom, left; step 2). Lastly, we performed phenotype enrichment to explore what diseases and phenotypes existed in these local neighborhoods (bottom, right; step 3). Circles indicate proteins. Red circles are drug targets, white circles are interactome proteins, grey circles are intermediate proteins included in the drug pathway, orange triangles represent drugs, purple circles represent gene variants (not applicable in all pathways), and green rectangles represent phenotypes.

PathFX uses a threshold parameter for selecting proteins included in each drug network. We derived this threshold based on the available data and did not tune the parameter to improve identification accuracy (explained in An optimal threshold parameter for the tissue non-specific network in Methods).

A case study in network application: Metformin’s network is highly associated with diabetes phenotypes and is not dependent on drug target associations

We first applied the PathFX method to the diabetic medication, metformin. DrugBank[29] listed five protein targets for metformin—SLC22A2, SLC22A3, PRKAB1, SLC47A1, and SLC29A4 –for metformin that were in our interactome (note that some of these are transport proteins that may be included because metformin inhibits them, not as a pharmacological effect). We created a drug interaction network pathway based on all listed proteins using PathFX. This yielded a 25-protein final neighborhood (20 proteins + 5 drug targets) significantly associated with 18 phenotypes (Fig 2A, S1 Table). Diabetes mellitus type 2 and diabetes mellitus type 1 are both associated with the metformin pathway via interactions with 12 genes (Fig 2B, S1 Table). Metformin’s protein targets were not sufficient to describe the association to diabetes mellitus type 1 and diabetes mellitus type 2 when we analyzed phenotypic associations with these targets. However, with the full 25 protein network identified by PathFX, we recovered the association to diabetes mellitus type 1 and diabetes mellitus type 2.

Fig 2. PathFX identified disease indications for Metformin.

(A) PathFX identified a pharmacodynamic pathway for Metformin. The drug (orange triangle) is connected to protein binding targets (red circles). PathFX identified additional genes and variants (grey circles) and associated this network with phenotypes (green boxes). Edges reflect protein-protein interactions or protein-disease associations. (B) Selected phenotypes associated with the Metformin network highlight similarities to the drug’s marketed disease indication (all associations in S1 Table).

A benchmarking set of approved drugs

We collected a benchmarking set of approved drugs to test our algorithm’s utility in accurately identifying diseases that the drug is known to effectively treat. This set included marketed drugs with approved disease indications. We first started with marketed, non-palliative drugs analyzed in [22] to compare our performance with this seminal work. This data set included 238 drugs associated with one or more disease indications, yielding a total of 403 drug-indication pairs. We augmented this dataset by using repoDB[30] to add additional approved indications for the original drug set (we excluded data from repoDB where the trial was terminated or ongoing); using repoDB, we added 1353 drug-indication pairs, yielding a total of 1756 drug-indication pairs for testing. The full list of drugs and approved indications are included in S1 File and consists of a list of drugs associated with one or more disease indications. Most drugs were approved for fewer than 10 indications, though, prednisolone and hydrocortisone are used to treat 105 and 98 indications respectively (S1 File, Fig 3A & 3B).

Fig 3. Characterization of the benchmarking drug set.

Number of approved indications per drug (A) and top drugs by number of approved indications (B). A tree diagram highlighting the top disease words in each of the 62 clusters showed how the diseases were grouped (C).

The dataset included drugs used to treat 594 indications. We binned these 594 indications into 62 clusters based on the semantic similarity of the disease indications. We used the interactive mmlite interface to metamap[31], a highly configurable program developed to map biomedical text to the UMLS Metathesaurus, to map diseases to the nearest Unified Medical Language System (UMLS) CUI (Concept Unique Identifier) identifier. We selected the UMLS terminology because this system had the greatest coverage of phenotypes in our dataset and contained mappings from many popular languages (such as MedDRA). We then clustered these diseases based on ontological, semantic similarity using the UMLS::Similarity package in Perl[32] (cluster membership in S1 File, Fig 3C). For instance, cluster four contained two CUI terms–C0497327, C0002395 –that mapped to 24 Alzheimer’s and dementia phenotypes (S1 File). Cluster five contained five CUI terms–C0042842, C0042875, C0030783, C0016412, C0936215 –that mapped to nine diseases associated with vitamin deficiency. When testing PathFX, we analyzed and reported whether the algorithm identified the drug’s original, un-clustered indication, and also reported results based on the indications’ cluster to assess trends in the types of diseases where we had better identification capacity.

PathFX identifies disease indication for marketed drugs and improves sensitivity

We used the UMLS::Similarity tools for determining if PathFX identified phenotypes that matched the drug’s marketed indication. In this case, we regarded a match as any phenotype significantly associated to the network; for most drugs, PathFX identified multiple phenotypes as statistically significantly associated to the drug’s network. For each drug, we pulled these significant phenotypes from our PathFX analysis as above, and matched these identified phenotypes to CUI identifiers, also using mmlite[31]. We measured semantic similarity using Lin distance in the UMLS::Similarity package[32]. For example, the drug enoxaparin is indicated for deep vein thrombosis and myocardial infarction; our algorithm identified that enoxaparin’s drug pathway was significantly associated with “deep venous thrombosis” (direct match, semantic similarity = 1.0), and other similar diseases such as “thrombosis” (semantic similarity = 0.8615), “venous thromboembolism” (semantic similarity = 0.6853), and “myocardial infarction” (semantic similarity = 0.9377) (full results in S1 File, Fig 4A). Of the 1756 drug-indication pairs, PathFX could not create an interaction network for two pairs (tolazamide + diabetes type 1, and tolazamide + diabetes type 2) because this drug’s target was not mapped to a gene symbol in our interactome. For 389 drug-indication pairs, metamap was unable to map the marketed indication to a CUI term so we could not assess whether the PathFX identified indications matched the marketed indications (cluster number 58 in Fig 4C and in S1 File). This left 1366 pairs for further analysis.

Fig 4. Characterization of PathFX performance.

PathFX identified associations between enoxaparin and deep vein thrombosis and myocardial ischemia (A). Each method identified phenotypes depending on the number of genes associated with the phenotype (Kruskal-Wallis statistic 33.6, p-value = 5.04x10-8); PathFX identifications are skewed towards phenotypes with fewer gene associations relative to identifications by targets alone (Mann-Whitney-U statistic = 33089, p-value = 1.43x10-8) (B). PathFX identification rate per disease cluster (C). The orange line indicates the cluster-specific sensitivity, the bars represent total drugs in each cluster, and the dashed line represents the median number of diseases (15 diseases) in each cluster.

For this analysis, PathFX found pathway information for 171 of the 403 drug-indication pairs (42.4%) from Guney et al[22] and 558 of the 1364 (40.9%) eligible drug-disease pairs in our expanded set of benchmarking drugs; this is our best estimate of PathFX sensitivity. At the drug level, 141 out of 236 drugs (there were two of the original 238 drugs without sufficient binding information in DrugBank) had at least one identified phenotype that was similar to one of the drug’s marketed indication(s) (59.8%). Guney et al[22] reported a sensitivity of 15.4% (they matched 62 of 403 drugs-indication pairs), demonstrating improved sensitivity from our drug-target-centric approach.

For comparison with PathFX, we analyzed the disease associations of the drug targets alone without the local interaction information. Using only drug targets, we identified statistically significant associations between 751 drug and disease pairs (54.98% sensitivity). Of these pairs, 409 were also identified using PathFX pathway information, leaving 342 drug-disease pairs that were identified by targets alone and 147 drug-disease pairs that were only identified when pathway information was included from PathFX (S1 File). After merging gene-to-phenotype associations from multiple data sources, each phenotype had a set of associated genes and the size of this gene set distinguished which diseases each method detected (Kruskal-Wallis statistics = 33.6, p-value = 5.04x10-8) (Fig 4B). PathFX was biased towards selecting phenotypes with fewer genes (median gene set size of 90 genes); targets-only analysis was biased to selecting phenotypes with more associated genes (median gene set size of 342.5 genes) (Mann-Whitney-U statistic: 33089, p-value 1.43x10-8) (Fig 4B). The median gene set size where both methods detected the phenotype was 257 genes.

For completeness, we calculated positive and negative predictive values (PPV, NPV) (S2 File). To calculate PPV and NPV, we made a conservative assumption that any phenotype associated with a drug that was not a marketed disease indication was a false positive. Because PathFX was designed to search broadly for drug-associated phenotypes, the PPV and NPV were deflated and inflated respectively (S3 Fig).

We analyzed PathFX identifications in the context of the 62 disease clusters (Table 1, Fig 4C). For 58 of the 62 clusters, PathFX found pathway evidence supporting the drug’s marketed indication for at least one of the drug-disease pairs assigned to that cluster (Table 1, S1 File). For instance, the top cluster contained three CUI terms which mapped to five disease phenotypes diseases: inappropriate adh syndrome, acromegaly somatic, hyperprolactinemia, acromegaly, and prolactin excess (CUI terms C0021141, C0001206, and C0020514) (Table 1). There were five drug-indication pairs for these diseases: tolvaptan (inappropriate adh syndrome), octreotide (acromegaly somatic), bromocriptine (hyperprolactinemia), bromocriptine (acromegaly somatic), and cabergoline (hyperprolactinemia). PathFX identified phenotypes for all five drug-disease pairs. The cluster with the second highest identification rate contained four CUI terms that mapped to seven disease terms, which are pathophysiologically distinct: alcohol withdrawal delirium, restless legs syndrome, premenstrual dysphoric disorder, insomnia, nicotine dependence, late insomnia, sleeplessness. There were seven drug-indication pairs included in this cluster: gabapentin (restless legs syndrome), ropinirole (restless legs syndrome), rotigotine (restless legs syndrome), diphenhydramine (late insomnia), estradiol (premenstrual dysphoric disorder), nicotine (nicotine dependence), and diazepam (alcohol withdrawal syndrome). PathFX identified the original, un-clustered phenotype for six of the seven pairs (PathFX did not identify estradiol’s association to premenstrual dysphoric disorder) (Table 1).

Table 1. Example disease clusters with high cluster-sensitivity.

In disease lists, ‘|’ is a delimiter to separate disease names. In drug lists, if a drug is listed multiple times, these listings reflect that the drug is intended to treat multiple indications in the specified cluster.

For the remaining four clusters, PathFX did not identify the drugs’ marketed indication for any of the drug-indication pairs assigned to these clusters (S1 File). These clusters are numbered 7, 13, 35, and 1. Additionally, cluster 58 contained 216 disease indications, of which 123 diseases were not mapped to a CUI term.

Strengthening adverse event signals for designated medical events from FAERS and estimating PathFX specificity

Understanding and prioritizing drug safety signals are important regulatory concerns[33,34]. The FDA Adverse Event Reporting System (FAERS) is a repository of voluntarily submitted case reports of adverse events that occur when a patient is on a particular medication. Multiple confounding variables, including comorbidities, incomplete reports, and polypharmacy[35], make it difficult to determine when a drug is causative for an AE. This makes signal detection and triaging reports difficult. Further, when associations are not directly explained by drug targets, pathway models may provide additional justification for why a drug is associated with an AE. Anticipating and identifying drug-induced AEs are critical for novel therapeutics in development as well; and thus, further characterizing which drug targets and interacting proteins may cause AEs is an important goal of this work. Designated medical events (DMEs) are serious, significant adverse drug events of special concern to regulators. As such, DMEs, selected based on expert medical review, were evaluated in this study.

To assess the utility of PathFX for signal detection, we extracted drug-adverse event pairs from FAERs for 1906 drugs across 35 DMEs and ran PathFX on these 1906 drugs. Case reports were associated with one of the 35 DMEs if the adverse event mapped to the preferred MedDRA DME term. Close synonyms were used for some DMEs to better capture reporting (e.g., “pancreatitis” was captured by “pancreatitis” and “pancreatitis acute”). PathFX identified the adverse event phenotype for the input drugs between 0.24% - 39.57%(Table 2 and S3 File), depending on the DME. For instance, in this noisy data set, 1045 drugs were reported as having an adverse association with pancreatitis. PathFX identified that 282 of these drugs were associated with pancreatitis (26.99%). Similarly, 1150 drugs were reported to have an association with myocardial infarction and PathFX identified 391 (34.00%) of these drug-DME associations. For eight of the DME phenotypes, we found no pathways associations to the reported drugs (S3 File).

We used this FAERS dataset to estimate a lower bound on the specificity of PathFX. Because FAERS contains many more drug-DME associations than are real, we treated any drugs without a reported DME association as silver-standard negatives; reasoning that if a noisy sampling of the FAERS system contained no association between the drug and the DME, that these pairs were sufficient negatives. We asked how often PathFX associated a drug with a DME when no case report existed to calculate the specificity rate for the 35 DMEs in this analysis; the rate varied from 80.84%-99.91% (Table 2, S3 File). For instance, 1261 drugs were reported to have an association with hypertension, leaving 645 of our original 1906 drugs without an association to hypertension. Of these 645 silver-standard negatives, PathFX identified an association with hypertension for 112 (17.4%) of the drugs (82.64% specificity). For cardiac arrest, 1211 drugs were reported to have an association, leaving 695 drugs without an association. Of these drugs, PathFX identified seven drugs to have an association, estimating a specificity of 98.99% (Table 2).

PathFX identifies hypotheses for drug repurposing by identifying novel drug-disease associations

PathFX identified multiple phenotypes for each drug even if the drug only has a single approved indication. We sought support for the additional identified phenotypes from two data sets: (1) a list of off-label drug uses extracted from the electronic medical record[36] and (2) drugs currently in clinical trials. In the EMR data, we found support for six drugs applied in 11 off-label indications (Table 3, listed as ‘Jung CUI’ and ‘Jung Disease’). For instance, telmisartan is indicated for hypertension, though PathFX and the EMR dataset supported the use of this drug as an anti-diabetic; this conclusion is further supported in the literature[37]. PathFX identified that thiothixene would be broadly applicable to depressive disorders beyond the indicated use for schizophrenia. PathFX also identified that etanercept, an anti-TNF-alpha drug used in auto-immune disorders, would be applicable to colitis and this identification was supported by the Jung dataset (Table 3). PathFX identified associations for 4 drug-indication pairs supported by additional clinical trials (Table 4): sunitinib for viral infections[38], erlotinib for viral infections[38], ketoprofen for lymphoedema[39], and sirolimus for dystrophic bullosa [40].

Table 3. PathFX identifications supported by off-label drug use.

The terms ‘Jung CUI’ and ‘Jung Disease’ are terms extracted from [36] and represent the associations between drugs and diseases found the electronic health record.

Table 4. PathFX identifications supported by on-going clinical trials.

Given evidence that EMR and clinical trial data supported PathFX predictions, we further scrutinized PathFX identifications to identify drug repurposing opportunities (Fig 5A); we inferred that a non-marketed indication could be a repurposing opportunity if the interaction path was found in the network of a drug marketed for this indication. We demonstrated an example with leuprolide and triptorelin, both gonadotropin-releasing hormone (GnRH) agonists (Fig 5B). PathFX also identified that the triptorelin pathway is enriched for associations with endometriosis, and these interaction pathways are supported by leuprolide’s pathway. PathFX also identified that the antispasmodic, flavoxate, could be indicated for urticaria based on interaction paths shared with cyproheptadine and promethazine, two anti-histamines already approved for urticaria. In total, we identified 2,043 new drug-disease associations for 215 drugs (S4 File). We ranked these predictions based on the number of diseases identified for a drug (top 20 in Table 5, remainder in S4 File), and the number of interaction paths supporting a drug-disease association (top 20 in Table 6, remainder in S4 File).

Fig 5. Identifying repurposing opportunities from interaction pathways.

In the schematic for repurposing identifications, we identified common edges among drugs approved for a particular indication (blue outlines). We infer repurposing opportunities when a drug’s network contained the same interaction edges linking the drug target to the particular indication (A). An example using Leuprolide and Triptorelin: Leuprolide’s full network (top, left) and a subset of edges associated with premature puberty disorders (blue outline, middle) and Triptorelin’s full network, (top, right), and a subset of edges associated with premature puberty disorders (orange outline, middle), and prostate cancers (blue outline, bottom, left) (B).

Table 5. The top 20 drugs by the number of repurposing opportunities identified by PathFX.

Table 6. The top 20 drug-disease pairs based on the number of interaction paths associating the drug to the disease phenotype.


Here we presented PathFX, a phenotypic pathways approach for characterizing drug efficacy and safety based on molecular interactions around the drug target. The algorithm characterizes the phenome around drug targets by integrating several data repositories relevant to regulatory review and therapeutic discovery. We supported our hypothesis that molecular interactions contribute to a drug’s safety and efficacy phenotypes. We successfully detected pathways that confirmed a drug’s association with its marketed disease indication and strengthened signals from adverse event reports in FAERS. In the analysis of marketed drugs, we benchmarked against a published dataset and additionally expanded this dataset to reflect updated uses of this drug set. We further discovered that PathFX identified drug phenotypes when the phenotype had relatively fewer genes associated in our data sources compared to analyzing phenotypes using the drug targets alone. We tested additional PathFX predictions using clinical trial and EMR data, and further identified novel uses for marketed drugs using networks from PathFX.

We identified associations for some phenotypes more than others. This suggested a few possibilities: incomplete data–gene-disease annotations, molecular interactions, or drug-protein binding–may prohibit the creation of pathways relating drug targets to disease associations; or marketed drugs may impact a clinically-relevant outcome to earn approval but may not be supported by genetic epidemiology studies. Incomplete knowledge of all drug-binding proteins limited our ability to construct complete drug pathways. In our database of drug-binding proteins taken from DrugBank [29], the mean and median number of proteins bound by a drug is 2.66 and 1.0 respectively. Indeed, more recent work estimates the mean and median number of proteins bound by a drug to be 329 and 38 respectively [41], suggesting that incorporation of more drug-protein interactions could improve our network predictions. Standardized drug-binding profiling could greatly improve the predictions from these algorithms. PathFX did not identify mechanism of action for all disease clusters. It is not surprising that PathFX did not identify bacterial infections, given that we are using a human protein-protein interactome. Cluster 13 contained brain cancer indications and suggested that the drug-target centric approach is not sufficient for describing efficacy for these anti-brain cancer therapies.

There are some limitations of our method: the model does not consider tissue specificity and is biased to selecting phenotypes with fewer gene annotations. Future work will consider incorporating tissue-specific interaction networks such as the GIANT networks [42] and consider screening drugs for binding across these tissues. PathFX quantifies the overlap between drug pathways and disease phenotypes but does not indicate directionality (helpful or harmful) between the drug and the pathway. Using a non-directional analysis enabled a broader discovery process given fewer directional molecular interaction networks. Compared to analyzing phenotypic associations to drug targets alone, PathFX was biased to select phenotypes with fewer genes associated and this is likely due to the statistical approach of our method: Starting with a smaller list (e.g. just the drug targets) increases the chance of finding a statistically significant association to phenotype for which there are many associations in the whole network. Conversely, starting with a larger list of proteins (e.g. using proteins from PathFX networks), decreases the chance of finding a statistically significant association to a phenotype with many proteins distributed in the interactome network. PathFX identified phenotypes where there is significant overlap with the network and where there are relatively fewer associations to the phenotype in the entire interactome network. We estimated algorithm specificity with a silver-standard data set of drugs not reported in FAERS. We recognize the limitation of this assumption because sampling biases and drug usage affect whether or not a drug is reported in FAERS in addition to our assumption that the drug does not cause an adverse event. However, we lacked sufficient gold-standard, true negatives with which to estimate specificity.

Our approach is not the first network biology tool for describing drug function, though, it does have different downstream applications. The comparator approach [22] reported lower sensitivity than PathFX. This could reinforce the role of incomplete data in creating pathways for marketed drugs or drug effects beyond the underlying disease pathway, as mentioned previously. Additionally, our improved performance could have resulted from the more permissive approach of our algorithm. The motivating question for regulatory review was “what biological evidence supports the validity of an observed adverse event?” in the post-market setting, and “what clinical trial assessments might be needed?” for vigilant detection of safety issues in the pre-market, investigational drug setting. In this paradigm, PathFX sampled disease signals around a drug target and was not constrained to find the right answer such as in the shortest-path method in Guney et al[22]. For our regulatory context, our expansive search was a positive design feature for understanding biological evidence supporting adverse events.

We demonstrated utility of PathFX in a pertinent regulatory context. Our analysis of FAERS data proved useful in detecting if and how a drug could cause a particular DME. We did not apply all available statistical filters to the FAERS data before analysis. Our intention was to apply PathFX as a biologically-motivated tool for signal detection and to develop PathFX as a complementary approach to other statistical methods; in practice, filtering using statistical approaches prior to PathFX application is a viable research strategy as well and using PathFX alone is not sufficient for identifying a causative relationship between a drug and AE. There were multiple DMEs for which we could not identify an interaction pathway. Though, for some of these diseases, further structured expansion of acceptable terms could improve prediction accuracy. For instance, the Ontology of Adverse Events (OAE)[43] provides a structured language for relating symptoms, findings, and measurements to disease and could be a useful tool for expanding the capacities of PathFX. In practice, FAERS reports do not contain research-ready disease language, and thus it is important to map and relate changes in symptoms to relevant diseases. For example, the DME, suicide attempt, would be difficult to identify with our method directly, but is likely related to depression and anxiety. Structured language relating these phenotypes would improve prediction accuracy for these phenotypes. Improved ontology incorporation is an aspiration for the project but was beyond the scope of this work.

Our findings reflect a trend in network biology to leverage drug pathways for repurposing approaches, though, the techniques are imperfect for describing all drugs. PathFX identified additional diseases beyond the marketed indications. Without a data set of true negatives (e.g. the drug was tested in condition X and did not work), it is difficult to systematically test and reject network predictions. Instead, we leveraged molecular interaction paths relating marketed drugs to their relevant disease genes and used these paths to identify possible repurposing opportunities. This prudent approach limited repurposing hypotheses to indications for which drugs already exist. Further, literature evidence supports some of the associations such as tetracycline and hypertension [44], but tetracycline induces an undesirable hypertension phenotype. We discovered these associations because we do not yet have a means for discerning directionality (e.g. a drug improves the phenotype, or a drug aggravates the phenotype). Future work will address this question.

The PathFX paradigm may be useful for both regulatory and pharmaceutical industry stakeholders to validate targets and enhance pharmacovigilance activities. We designed and tested our algorithm’s utility for one regulatory task: strengthening signals from adverse event reports in FAERS. Additionally, the drug-target-centric approach is useful for drug targets in development and may be used as a filter for identifying potential safety concerns and for confirming a sufficient association with disease. In contrast to therapeutic development through high-throughput screening, PathFX epitomizes the paradigm of identifying drug candidates based on biological rationale and supports the pathway relevance of a drug target.

Materials and methods

Interactome assembly, and scoring

We downloaded data from iRefWeb version 13.0 human, Reactome, and PharmGKB. We chose iRefWeb because the source contains interactions from BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPact, MPPI, and OPHID. We extracted protein-protein interactions from, drug-variant interactions from PharmGKB, and protein-protein interactions from Reactome. We scored iRefWeb protein-protein interactions using the MIScore framework[45] and removed low-scoring interactions below the median score value, 0.244, to save memory in later computations. We kept the relative weighting of each score component equal (i.e. Km = Kp = Kt = 1). The scoring framework represents the amount of evidence supporting the interaction of two proteins: the Sm represents the method used to detect the interaction and is higher for more dedicated experimental techniques. Sp reflects the number of publications supporting an interaction. This score increases with the number of publication and plateaus. St reflects the interaction type. Because we only used ‘direct’ interactions, this score is always 1.

We adapted the MIScore framework for PharmGKB data and used publication, and ‘clinical evidence’ to score drug-variant relationships. Whenever an interaction with a variant was added to our network, we also added an interaction edge from the variant to the gene and scored this interaction as 0.99, the maximum possible score in the interactome. We used the following equation where Kp = Ke = 1, Sp was the same as published in [45]. Se reflects the clinical level evidence available from PharmGKB and we crafted a scoring framework similar to [45].

Where scv’1A’ = 0.99, scv’1B’ = 0.86625, scv’2A’ = 0.7425, scv’2b’ = 0.61875, scv’3’ = 0.495, and scv’4’ = 0.2475. Because interactions in PharmGKB only receive one level of clinical evidence, a and b collapse to:

We adapted this scoring framework for Reactome pathways using the following equation:

Sp was the kept the same as in [45]. We derived a scheme for the method component, Sm, where scv’direct_complex’ = 1.00, scv’neighboring_reaction’ = 0.66, scv’indirect_complex’ = 0.8, scv’reaction’ = 1.0, and scv’unknown’ = 0.05.

Because the two maximum scoring categories were ‘direct_complex’ and ‘reaction’, the Max(Gscvi) term = 2.0. Because Reactome interactions did not include a method of detection but are curated interactions, we gave these interactions a cvi = 0.8. Because they all received the same ‘method’ score, calculating a and b yields a Sm = 0.615.

We lastly incorporated predicted drug to protein binding data based on PocketFEATURE [46] where drug-protein pairs were scored based on the similarity between the drug’s known targets and other protein targets from the Protein Data Bank[47] (See methods below). PocketFEATURE has been extensively validated on predicting drug protein interactions in multiple applications [46,48,49]. In all cases, we estimated interaction scores based on the quality of evidence available; these edge scores were fixed before applying PathFX and we did not alter these parameters to improve prediction accuracy.

Phenotype, disease, and variant data

We downloaded variant and phenotype association data from PheWAS[11,12], disease to gene associations from DisGeNet[10,11], Phenotype-Genotype Integrator (PheGenI) [26], ClinVar[27], and OMIM[8], and eQTL data from the GWAS catalogue[7]. We collapsed all phenotype names to CUI identifiers using MetaMap lite and took the union of all data sources to create our source of gene to phenotype annotations. This yielded a database associating 29785 genes to 20524 phenotypes.

To look for enriched phenotypes, we used a Fisher’s exact test and Benjamini-Hochberg multiple hypothesis correction to assess whether a disease or phenotype had more associations to the network genes relative to the total number of associations in the interactome. We filtered out disease or phenotypes associated with fewer than 25 genes. Recent work suggested that current interactomes are insufficient for analyzing disease pathways with fewer than 25 genes[28]. More specifically, interactomes are incomplete and these missing interactions reduce the ability to find pathways between smaller disease modules. This recent estimate discovered that missing interactions disproportionally affect phenotype pathways with fewer than 25 genes and thus, we eliminated these phenotypes knowing that our interaction network was insufficient for studying these phenotypes.

We further filtered phenotypic predictions by deriving a p-value threshold from networks created with randomly-selected, druggable proteins. We realized that if any random set of input drug-targeting proteins could discover a statistically-significant association to a given phenotype, that associating a real set of drug proteins with this phenotype would reflect bias in our data instead of a biologically-meaningful result. To assess this bias in our data, we created 100 networks using randomly selected drug targets from the intersection of all drug-targeting proteins in DrugBank and ran PathFX using these targets as inputs (i.e. using the same statistical approaches to assess phenotypes that are significantly associated to networks created with random inputs). Because the distribution of p-values from the 100 randomizations was not normally distributed, we used the median value as the threshold p-value for determining if a drug network was associated to a phenotype. The number of randomly selected input proteins matched the number of targets of the drug of interest. PathFX retained a phenotype if the association is more significant than the p-value threshold for that phenotype run with the same number of random, input protein targets.

When analyzing drug targets alone, we again used the Fisher’s exact text, the Benjamini-Hochberg correction, and filtering relative to the expected p-value threshold.

Depth-first network search

We created a depth-first search tool which ‘walked’ away from a drug’s protein target through the interactome and evaluated these paths using the multiplicative sum of all interaction edges between a gene and the target. Genes with a path score above the threshold were retained in the drug pathway. We expedited the search with “fast-tracking”. This process reflects the fact that molecules exist in highly interconnected pathways and assumed that we could reduce the searchable interaction spaces by looking for molecular redundancies. As the search explored an interaction path, fast-tracking searched the remaining que of interaction paths for genes that had already been added to the network and added these interaction paths to the network. Interaction edge scores were used from the scoring system above. We used specificity analysis to determine an optimal threshold for the interactome (S2 Fig, explained below). In the case of multiple protein targets, we created a pathway around each protein target, and merged these neighborhoods to create the full protein network.

Interaction specificity analysis

We evaluated the interactome specificity by comparing a gene’s path score to all possible path scores for that gene. To measure all path scores, we created pathways for all genes in the interactome network, treating each molecular entity as a drug target and creating a pathway as described above. We used these empirically-derived scores to calculate an enrichment score for an entity in the pathway of a real drug target by subtracting the average path score to that gene from the gene’s score in the drug pathway (Fig 1, ‘Interaction Specificity Analysis’).

An optimal threshold parameter for the tissue non-specific network

We selected an optimal threshold by evaluating gene specificity at threshold values from 0.7 to 0.9. At each of these values we created a drug pathway around the drug’s protein target(s), calculated the gene specificity, and then tabulated the fraction of genes that are specific to a drug target (i.e. have a specificity score > 0). We plot the normalized histograms of specificity values in S2A Fig and a distribution of the proportion of specific paths at each threshold value in S2B Fig.

PathFX code

The PathFX code is available at: Using the algorithm requires minimal inputs and creates a network and several association files as depicted in S4 Fig. The user provides three inputs: 1. an analysis name. 2.the name of the drug. 3. an optional list of proteins (if the drug-binding proteins are not in DrugBank or the user wishes to complete a more specific analysis).The algorithm creates a set of output files: 1. networks for individual target proteins and a merged interaction network combining networks from each target proteins. These files are tab-delimited files with one interaction per line and the score for that interaction. 2. An association table containing one significantly-associated network phenotype, a p-value for that association, and network genes associated with that phenotype. 3. A table listing the database source for individual phenotype-gene associations.

Mapping to Common Unique Identifier (CUI) terms

For all diseases, these phenotypes were mapped to CUI terms using Metamap lite[31]. This was the same process used in assembling the phenotype dataset.

Semantic similarity and disease clustering

We downloaded the UMLS Metathesaurus, version 2017AA and used the Perl packages UMLS::Interface[32] and UMLS::Similarity[32] to measure the lin distance between diseases in a set. For the gold-standard drug set, we calculated a matrix of similarity values for all approved indications and we used SciPy in Python to perform hierarchical clustering. We identified 62 as the optimal number of clusters using the elbow method. For visualization of the dendrogram, we counted the top five disease-associated words in each cluster. To determine how well PathFX identified a drug’s approved indication, we again used the scripts to calculate similarity between the approved indication and the PathFX identified phenotypes.

Positive and Negative predictive values

Because the number of true positives and true negatives varied for each drug and for each phenotype, we calculated the PPV and NPV separately for each drug and for each phenotype. To calculate PPV, we assumed that false positives were any PathFX identified phenotype that was not a marketed indication. The PPV was the ratio of correctly identified marketed indications to the total of marketed indications and additional PathFX phenotypes. To calculate NPV, we considered any phenotype from our dataset that was not a marketed disease indication to be a true negative. The NPV was the ratio of these true negatives to the sum of the true negatives and the unidentified, marketed indications. When calculating PPV for each phenotype, we assumed that a false positive was any drug identified by PathFX to be associated to the phenotype but was not marketed for that phenotype. When calculating NPV for each phenotype, we considered as true negatives any drugs not marketed for or identified by PathFX to be associated with the phenotype.

Statistical comparison of performance between PathFX and drug targets alone

We calculated the number of genes associated with the original marketed indication for drug-disease pairs identified by PathFX only, targets only, or identified by both methods. We first used a Kruskal-Wallis test implemented in the Python package, SciPy, to determine that these populations were not from the same distribution. We tested the hypothesis that the targets-only analysis was biased towards diseases with more genes using the Mann-Whitney-U statistic implemented in the Python package, SciPy.

FAERS drug-DME pair data extraction

The FDA Adverse Event Reporting System (FAERS) data was extracted using the Oracle Health Sciences Empirica Signal software. Although, the software is not publicly available, we used only publicly-available data at the time of analysis (2004-2017Q3). All drugs that had at least one case reported for the 35 MedDRA Preferred Terms identified as Designated Medical Events by review and medical experts at the FDA were included in this analysis.

Repurposing drugs for marketed indications

From the successfully matched drug-indications pairs, we created a catalogue of interaction paths that supported the association between these drugs and their approved indications. We searched through remaining drug pathways and asked if any associations with the drugs’ non-marketed indications were supported by the catalogue of interaction paths. These non-marketed associations became our cohort of repurposing predictions.

Predicting drug protein interactions

The Drug-binding Dataset collects 984 high-quality 3D structures (x-ray resolution higher than 2.5 Å) that co-crystalized with FDA approved small molecule drugs (non-nutraceuticals), representing binding environments of 284 distinct drugs [50]. The Human Off-target Dataset comprises 2271 proteins representing a non-redundant representative set (90% percent identity) of human proteins and their close homologs that have high quality 3D structures (x-ray resolution higher than 2.5 Å) in PDB[51]. We have applied PocketFEATURE[46] to predict the probability of binding between the 284 drugs and the 2271 potential off-targets. PocketFEATURE uses the FEATURE representation to calculate site similarities by aligning microenvironments between two sites. A more negative score suggests binding site similarity and thus a higher probability of drug binding to a site similar (off-target) to its known binding site. Given a pair of drug and off-target, we used an average score of similarity between the binding sites and the off-target. For each drug, we generated a profile of its binding probability to each of the 2271 potential off-targets.

Supporting information

S1 File. Summary of PathFX associations for marketed drugs.

The file contains the starting list of drug-disease associations from marketed drugs, associations between drugs and diseases discovered by PathFX, a clustered summary where drug-disease associations are binned based on the semantic similarity of the diseases, a comparison to associations found from using drug targets without network information, and a listing of disease associations discovered by PathFX, targets-only, or both methods and the number of genes annotated to these diseases.


S2 File. Positive and negative predictive value calculated by drug and by disease CUI term.


S3 File. Analysis of FAERs case reports using PathFX.

A summary of where PathFX associated a drug to a reported adverse event and a summary of false positives from the silver-standard set of negative cases (i.e. drugs without a reported association to an adverse event in FAERs).


S4 File. Summary of repurposing predictions.

A summary of drug-disease associations ranked by the number of pathways supporting an association and by the number of predicted alternative indications.


S1 Fig. A tissue-non-specific interactome for identifying drug pathways.

The table shows the number of interactions obtained from each database (A) and the distribution of edge scores (B). These edge scores represent the probability of an interaction given the available evidence.


S2 Fig. Generalized interaction specificity analysis results.

We created all drug target paths using the depth-first search at threshold values ranging from 0.7–0.9. We calculated gene specificity relative to all paths created for that gene and plot specificity values as normalized histograms (A). The total number of paths and the percent enriched (scored >0) are indicated in each figure legend. We additionally plot the fraction of specific paths against the total number of paths (B). Shading is a linear gradient corresponding to the threshold value (0.9 = dark purple, 0.7 = light purple). A threshold of 0.77 was used for all further analyses.


S3 Fig.

Density histograms of positive and negative predictive values for targets only analysis (top row, gray) and PathFX analysis (bottom row, blue).


S4 Fig. Summarized schematic for PathFX usage.

The user provides three inputs: 1. an analysis name. 2. the name of the drug. 3. an optional list of proteins (if the drug-binding proteins are not in DrugBank or the user wishes to complete a more specific analysis). The algorithm outputs a set of output files: 1. networks for individual target proteins and a merged interaction network combining networks from each target proteins. These files are tab-delimited files with one interaction per line and the score for that interaction. 2. An association table containing one significantly-associated network phenotype, a p-value for that association. 3. A table listing the database source for individual phenotype-gene associations.


S1 Table. PathFX identified phenotypes for Metformin.

A list of phenotypes significantly associated with the Metformin network.



Some of the computing for this project was performed on the Sherlock cluster. We would like to thank Stanford University and the Stanford Research Computing Center for providing computational resources and support that contributed to these research results. The authors would like to thank Ted Pederson, Bridget McInnes, and Ravi Teja Bhupatiraju for their extensive discussions about the best uses of the UMLS Metathesaurus resources. The authors would like to thank Keith Burkhart for helpful discussion about the content of this article.


  1. 1. Hay M, Thomas DW, Craighead JL, Economides C, Rosenthal J. Clinical development success rates for investigational drugs. Nature Biotechnology. 2014 Jan;32(1):40–51. pmid:24406927
  2. 2. Nelson MR, Johnson T, Warren L, Hughes AR, Chissoe SL, Xu C-F, et al. The genetics of drug efficacy: opportunities and challenges. Nature Reviews Genetics. Nature Research; 2016 Apr 1;17(4):197–206. pmid:26972588
  3. 3. Bowes J, Brown AJ, Hamon J, Jarolimek W, Sridhar A, Waldron G, et al. Reducing safety-related drug attrition: the use of in vitro pharmacological profiling. Nature Reviews Drug Discovery. Nature Publishing Group; 2012 Dec 1;11(12):1–14.
  4. 4. Mizutani S, Pauwels E, Stoven V, Goto S, Yamanishi Y. Relating drug-protein interaction network with drug side effects. Bioinformatics. 2012 Sep 7;28(18):i522–8. pmid:22962476
  5. 5. Zhang C, Hong H, Mendrick DL, Tang Y, Cheng F. Biomarker-based drug safety assessment in the age of systems pharmacology: from foundational to regulatory science. Biomarkers Med. 2015 Nov;9(11):1241–52.
  6. 6. Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl K, Thorn CF, et al. Pharmacogenomics Knowledge for Personalized Medicine. Clinical pharmacology and therapeutics. NIH Public Access; 2012 Oct 1;92(4):414–7. pmid:22992668
  7. 7. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research. 2013 Dec 28;42(D1):D1001–6.
  8. 8. Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick's Online Mendelian Inheritance in Man (OMIM(R)). Nucleic Acids Research. 2009 Jan 1;37(Database):D793–6. pmid:18842627
  9. 9. Piñero J, Queralt-Rosinach N, Bravo À. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database (Oxford) 2015.
  10. 10. Piñero J, Bravo À, Queralt-Rosinach N. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids. 2016.
  11. 11. Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010 Apr 23;26(9):1205–10. pmid:20335276
  12. 12. Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, Field JR, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nature Biotechnology. Nature Publishing Group; 2013 Nov 17;31(12):1102–10. pmid:24270849
  13. 13. Snel B, Lehmann G, Bork P, Huynen MA. STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Research. Oxford University Press; 2000 Sep 15;28(18):3442–4. pmid:10982861
  14. 14. Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Research. 2017 Jan 4;45(D1):D362–8. pmid:27924014
  15. 15. Turinsky AL, Razick S, Turner B, Donaldson IM, Wodak SJ. Navigating the global protein-protein interaction landscape using iRefWeb. Methods Mol Biol. Totowa, NJ: Humana Press; 2014;1091(Suppl 1):315–31.
  16. 16. Hur J, Guo AY, Loh WY, Feldman EL, Bai JPF. Integrated Systems Pharmacology Analysis of Clinical Drug-Induced Peripheral Neuropathy. CPT: pharmacomet syst pharmacol. 2014 May 14;3(5):e114–11.
  17. 17. Hur J, Liu Z, Tong W, Laaksonen R, Bai JPF. Drug-Induced Rhabdomyolysis: From Systems Pharmacology Analysis to Biochemical Flux. Chem Res Toxicol. 2014 Mar 17;27(3):421–32. pmid:24422454
  18. 18. Hur J, Zhao C, Bai JPF. Systems Pharmacological Analysis of Drugs Inducing Stevens–Johnson Syndrome and Toxic Epidermal Necrolysis. Chem Res Toxicol. 2015 May 18;28(5):927–34. pmid:25811541
  19. 19. Melas IN, Sakellaropoulos T, Iorio F, Alexopoulos LG, Loh W-Y, Lauffenburger DA, et al. Identification of drug-specific pathways based on gene expression data: application to drug induced lung injury. Integr Biol (Camb). The Royal Society of Chemistry; 2015 Aug;7(8):904–20.
  20. 20. Sakellaropoulos T, Herod TJ, Alexopoulos LG, Bai JPF. Computed Biological Relations among Five Select Treatment-Related Organ/Tissue Toxicities. Chem Res Toxicol. 2016 May 16;29(5):914–23. pmid:27063352
  21. 21. Lorberbaum T, Nasir M, Keiser MJ, Vilar S, Hripcsak G, Tatonetti NP. Systems Pharmacology Augments Drug Safety Surveillance. Clinical pharmacology and therapeutics. 2014 Dec 20;97(2):151–8. pmid:25670520
  22. 22. Guney E, Menche J, Vidal M, Barabási A-L. Network-based in silico drug efficacy screening. Nat Commun [Internet]. 2016 Feb 1;7:10331. Available from: pmid:26831545
  23. 23. Yoo S, Noh K, Shin M, Park J, Lee K-H, Nam H, et al. In silico profiling of systemic effects of drugs to predict unexpected interactions. Scientific Reports. 2018 Jan 25;8(1):1612. pmid:29371651
  24. 24. Yildirim MA, Goh K-I, Cusick ME, Barabási A-L, Vidal M. Drug-target network. Nature Biotechnology. 2007 Oct;25(10):1119–26. pmid:17921997
  25. 25. Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, et al. The Reactome pathway knowledgebase. Nucleic Acids Research. 2014 Jan;42(Database issue):D472–7. pmid:24243840
  26. 26. Ramos EM, Hoffman D, Junkins HA, Maglott D, Phan L, Sherry ST, et al. Phenotype-Genotype Integrator (PheGenI): synthesizing genome-wide association study (GWAS) data with existing genomic resources. Eur J Hum Genet. 2014 Jan;22(1):144–7. pmid:23695286
  27. 27. Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research. 2014 Jan;42(Database issue):D980–5. pmid:24234437
  28. 28. Menche J, Sharma A, Kitsak M, Ghiassian SD, Vidal M, Loscalzo J, et al. Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science. American Association for the Advancement of Science; 2015 Feb 20;347(6224):1257601–1. pmid:25700523
  29. 29. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research. 2006 Jan 1;34(Database issue):D668–72. pmid:16381955
  30. 30. Brown AS, Patel CJ. A standard database for drug repositioning. Scientific Data 2017 4:null. Scientific Data; 2017 Mar 14;4:sdata201729.
  31. 31. Demner-Fushman D, Rogers WJ, Aronson AR. MetaMap Lite: an evaluation of a new Java implementation of MetaMap. J Am Med Inform Assoc. 2017 Jul 1;24(4):841–4. pmid:28130331
  32. 32. McInnes BT, Pedersen T, Pakhomov SVS. UMLS-Interface and UMLS-Similarity: open source software for measuring paths and semantic similarity. AMIA Annu Symp Proc. American Medical Informatics Association; 2009 Nov 14;2009:431–5. pmid:20351894
  33. 33. Bouvy JC, De Bruin ML, Koopmanschap MA. Epidemiology of adverse drug reactions in Europe: a review of recent observational studies. Drug Saf. 2015 May;38(5):437–53. pmid:25822400
  34. 34. Boland MR, Jacunski A, Lorberbaum T, Romano JD, Moskovitch R, Tatonetti NP. Systems biology approaches for identifying adverse drug reactions and elucidating their underlying biological mechanisms. Wiley Interdiscip Rev Syst Biol Med. 2016 Mar;8(2):104–22. pmid:26559926
  35. 35. Tatonetti NP, Ye PP, Daneshjou R, Altman RB. Data-driven prediction of drug effects and interactions. Sci Transl Med. American Association for the Advancement of Science; 2012 Mar 14;4(125):125ra31–1. pmid:22422992
  36. 36. Jung K, LePendu P, Chen WS, Iyer SV, Readhead B, Dudley JT, et al. Automated detection of off-label drug use. Sarkar IN, editor. PLoS ONE. 2014;9(2):e89324. pmid:24586689
  37. 37. Suksomboon N, Poolsup N, Prasit T. Systematic review of the effect of telmisartan on insulin sensitivity in hypertensive patients with insulin resistance or diabetes. J Clin Pharm Ther. 2012 Jun;37(3):319–27. pmid:21848583
  38. 38. Bekerman E, Neveu G, Shulla A, Brannan J, Pu S-Y, Wang S, et al. Anticancer kinase inhibitors impair intracellular viral trafficking and exert broad-spectrum antiviral effects. J Clin Invest. American Society for Clinical Investigation; 2017 Apr 3;127(4):1338–52. pmid:28240606
  39. 39. Tian W, Rockson SG, Jiang X, Kim J, Begaye A, Shuffle EM, et al. Leukotriene B4antagonism ameliorates experimental lymphedema. Sci Transl Med. 2017 May 10;9(389):eaal3920.
  40. 40. [Internet]. Bethesda (MD): National Library of Medicine (US). 2000 Feb 29. Identifier NCT02960997, Using Topical Sirolimus 2% for Patients With Epidermolysis Bullous Simplex (EBS) Study; 2016 Nov 10 [cited 2018 April 10]; Available from:
  41. 41. Zhou H, Gao M, Skolnick J. Comprehensive prediction of drug-protein interactions and side effects for the human proteome. Scientific Reports. Nature Publishing Group; 2015 Jun 9;:1–13.
  42. 42. Greene CS, Krishnan A, Wong AK, Ricciotti E, Zelaya RA, Himmelstein DS, et al. Understanding multicellular function and disease with human tissue-specific networks. Nature Publishing Group. Nature Genetics; 2015 Apr 27;47(6):569–76. pmid:25915600
  43. 43. Yongqun H, Sarntivijai S, Lin Y, Xiang Z, Guo A, Zhang S, et al. OAE: The ontology of adverse events. Journal of Biomedical Semantics 2017 8:1. BioMed Central; 2014;5(1):29.
  44. 44. Holst AV, Danielsen PL, Romner B. A severe case of tetracycline-induced intracranial hypertension. Dermatol Reports. 2011 Jan 31;3(1):e1. pmid:25386253
  45. 45. Villaveces JM, Jimenez RC, Porras P, del-Toro N, Duesbury M, Dumousseau M, et al. Merging and scoring molecular interactions utilising existing community standards: tools, use-cases and a case study. Database (Oxford). Oxford University Press; 2015 Jan 1;2015(0):bau131–1.
  46. 46. Liu T, Altman RB. Using Multiple Microenvironments to Find Similar Ligand-Binding Sites: Application to Kinase Inhibitor Binding. Beard DA, editor. PLoS Comput Biol. Public Library of Science; 2011 Dec 29;7(12):e1002326. pmid:22219723
  47. 47. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Research. Oxford University Press; 2000 Jan 1;28(1):235–42. pmid:10592235
  48. 48. Liu T, Altman RB. Relating Essential Proteins to Drug Side-Effects Using Canonical Component Analysis: A Structure-Based Approach. J Chem Inf Model. 2015 Jul 27;55(7):1483–94. pmid:26121262
  49. 49. Miguel A, Hsin J, Liu T, Tang G, Altman RB, Huang KC. Variations in the binding pocket of an inhibitor of the bacterial division protein FtsZ across genotypes and species. Wade RC, editor. PLoS Comput Biol. Public Library of Science; 2015 Mar;11(3):e1004117. pmid:25811761
  50. 50. Liu T, Altman RB. Identifying druggable targets by protein microenvironments matching: application to transcription factors. CPT: pharmacomet syst pharmacol. 2014 Jan 22;3(1):e93.
  51. 51. Liu T, Oprea T, Ursu O, Hasselgren C, Altman RB. Estimation of Maximum Recommended Therapeutic Dose Using Predicted Promiscuity and Potency. Clin Transl Sci. 2016 Dec;9(6):311–20. pmid:27736015