Fig 1.
Schematic representation of the RPath algorithm.
Step 1) All acyclic paths of a given length between a drug and a disease in the KG are calculated. If there exist causal acyclic paths connecting the drug and the disease, a subgraph involving all these paths is inferred. This subgraph represents the proposed mechanism of action by which the drug may be a therapeutic target of the given disease. Step 2) Transcriptomic signatures observed from a drug-perturbed experiment are overlaid onto each corresponding node present in these paths. Then, RPath traverses through each path and evaluates whether the inferred direction of regulation (i.e., activation or inhibition) at every step is concordant with the up- and down- regulations (i.e., red and green nodes, respectively) observed in the transcriptomic signatures. Step 3) In a similar manner, transcriptomic signatures observed within a specific disease context are overlaid onto each corresponding node in the concordant paths from the previous step (if any). Next, RPath evaluates whether the disease transcriptomic signatures contradict the paths that were concordant with the drug signatures. If this is the case, the specific drug-disease pair is prioritized.
Table 1.
Evaluation of RPath in multiple datasets across the two KGs using precision.
Each row corresponds to the results of running RPath on a specific drug-disease dataset combination. The second and fourth columns show the performance that is expected to be achieved by chance.
Fig 2.
Devoncoluting the mechanism of action of a drug through RPath.
By investigating all the paths of a given length between a drug and a disease in a KG, we can analyze the different mechanisms that are proposed by RPath. a) Visualization of the custom KG. Proteins are colored in blue, diseases in red and drugs in green. Sankey diagram illustrating a sample of the paths between ponatinib and AML (b) and bicalutamide and prostate cancer (c) for the custom KG. Activatory relations in the Sankey diagrams are colored in red and inhibitory relations in blue.
Table 2.
Top 5 prioritized protein target-disease pairs.
These results were obtained by running RPath over both KGs with the GEO and Open Targets datasets using the same path length as the drug discovery task (see Methods). Pairs were prioritized based on the number of concordant paths. The vast majority of pairs were prioritized using the disease transcriptomic signatures from the GEO dataset given its larger coverage of measured genes compared to Open Targets (S4 Table).
Fig 3.
Pseudocode of the RPath algorithm.
Given a KG, drug, disease and a defined path length (i.e., lmax), the core function of the algorithm, is_drug_prioritized, returns whether a drug should be prioritized or not. For this, the function calculates all acyclic paths between a drug-disease pair in the KG. For each path found, drug-perturbed (i.e., drug_tr) and disease-specific (disease_tr) transcriptomic signatures are overlaid onto their corresponding protein nodes. The function then prioritizes the drug if at least one path is concordant with the observed drug-perturbed transcriptomic signatures (evaluated via Function 1, is_concordant) and the same path is anti-correlated with the observed disease-specific transcriptomic signatures (evaluated via Function 2, is_anti_correlated). Paths which match both the drug-perturbed signatures and contradict disease-specific signatures are then returned by RPath as promising drug candidates.
Fig 4.
Distribution of node and edge types in the custom and OpenBioLink KGs.
The properties of each of the two networks are detailed in S7 Table.