Fig 1.
Schematic illustration of causal reasoning by drug2ways over simplified networks.
a) Prototypic network used by drug2ways for drug discovery. The network contains causal relations between three modalities (i.e., drugs, proteins, and indications/phenotypes). Here, singular paths from three drugs to an indication as well as associated phenotypes are shown, though a single drug may contain multiple paths to a given indication/phenotype. Drug2ways reasons over all possible paths in a network between a drug and an indication/phenotype to predict the relative effect of each drug. In the example, we want to investigate whether one of the three drugs depicted inhibits an indication and its two phenotypes. While all three drugs target the disease, two of the three (i.e., drug A and C) fail to produce the desired effects (i.e., inhibition of the indication of interest and its two associated phenotypes). By reasoning over all the paths between the drug and the three target nodes of interest (i.e., indication and its phenotypes), drug2ways predicts that drug B could be a promising candidate as the majority of the paths would result in their inhibition, and thus produce a therapeutic effect. Similarly, drug2ways can also be used to evaluate the effect of a drug on a single indication/phenotype or to assess the effect of drug combinations. b) Example network containing all paths between a given drug and an indication. c) All possible paths between the drug and indication in (b). The drug2ways algorithm incorporates two variants, namely all paths and simple paths, enabling users to account for or ignore feedback loops (i.e., cycles), respectively. We distinguish between different paths based on the maximum number of allowable edges from a drug X to an indication Y (i.e., lmax parameter). For instance, the shortest path between the drug and the indication has an lmax of 3 while an lmax of 6 will capture this and four additional simple paths, two of length 4 and a further two of length 6. Using the all paths version of the algorithm, an additional cyclic path of length 6 is also captured.
Table 1.
Results of the validation experiments.
The table presents the validation experiments for each of the four networks (i..e, OpenBioLink, permuted OpenBioLink, In-House, and permuted In-House) using two variants of the algorithm (i.e., all paths and simple paths) based on two different prioritization criteria (see Methods) as well as the results yielded when only considering the shortest path between a drug-disease pair. For each experiment, we report the relative number of true positives in the list of drug-disease pairs prioritized by drug2ways. The proportion of true positives recovered by both variants of drug2ways in the two original networks are significantly higher than chance level (i.e., 3.19% for OpenBioLink and 3.76% for the In-House network).
Fig 2.
Identification of drugs targeting an indication and several associated phenotypes.
The heatmaps summarize the results of running the all paths version of the drug2ways algorithm over the In-House network for variable path lengths. While the algorithm outputs scores between 0 and 1, where 0 denotes no activation or inhibition and 1 denotes a full activation or inhibition, scores were normalized between the range of -1 to 1. Here, normalized scores of the relative effects of drugs on cystic fibrosis and several of its associated phenotypes are displayed where values below and above 0 denote the inhibition (blue) and activation (red) of all paths between a drug and target indication/phenotype at a specific lmax, respectively, whilst 0 denotes a cancelling effect (gray). In a fourth case, no paths exist between the drug and indication/phenotype (white). a) Hierarchical clustering of normalized scores of the relative effects of all drugs in the In-House network on cystic fibrosis and related phenotypes at lmax 8. b) Heatmap illustrating a subset of drugs at lmax 4 which distinctly optimize therapeutic effects through inhibition of several disease/phenotypic targets (e.g., Amiloride, D-methorphan, Losartan), activate the disease and/or its phenotypes (e.g., Dienogest), result in both the inhibition of some diseases/phenotypes and the activation of others (e.g., Desonide, Ziprasidone, Nimodipine), or do not possess paths to particular targets (e.g., Testolactone).
Table 2.
Examples of predicted combination therapies supported by literature evidence on four cancer types.
The table reports drug combinations identified by drug2ways that inhibit each of the various cancer types and supporting literature evidence. These results were obtained by running the all paths version of the algorithm over the In-House network for lmax 4.
Fig 3.
Average time required to calculate the effect of simple paths for all drug-disease pairs used in the validation on two heterogeneous networks using different lmax.
The analysis was also conducted to take paths with repetitions of vertices between drug-disease pairs into account using the all_paths variant of drug2ways, but not for the NetworkX and NetworKit libraries which lack equivalent implementations. Nevertheless, the implementations of both libraries could be easily adapted to return paths with repetitions of vertices. However, without the proper optimizations described in the Subsection Theoretical background, these would have a higher complexity than their all_simple_paths counterpart as nodes would be revisited. Therefore, for both libraries we use simple paths as the baseline for the analysis.
Table 3.
Definitions of terms used in this paper.
Fig 4.
Distribution of node types and relationships in the In-House and OpenBioLink networks.
a) The OpenBiolink KG contains a greater proportion of PubChem drugs relative to the In-House network which solely contains drugs from DrugBank. While the number of proteins in each of the two networks is comparable, indications are more numerous in the In-House network with respect to the OpenBioLink KG. Phenotypes for the In-House network were sourced from OpenBioLink, and as such, are equivalent in number. b) The total number of drug-protein interactions is greater in the OpenBioLink network than in our In-House. A greater proportion of protein-protein interactions are present in the In-House network, as are the number of protein-indication edges while the number of protein-phenotype interactions are nearly equivalent.
Table 4.
Clinical trial information mapped to the OpenBioLink and In-House networks for drug2ways validation.
The procedure to extract the information from ClinicalTrials.gov and the corresponding lists of drugs and diseases are available at https://github.com/drug2ways/results/tree/master/validation.
Table 5.
Illustration of the prioritization with three example pairs (i.e., A, B, and C).
For each lmax, the number and percentage of inhibitory paths is shown. While all three pairs show a similar pattern, pair B has less than 70% of inhibitory paths for lmax = 3 (i.e., Criterion 2) while for pair C, an increase in the number of paths from lmax = 2 to lmax = 3 does not occur (i.e., Criterion 3). Finally, pair A fulfills all three criteria and can thus be categorized as a prioritized pair.