Figure 1.
Schematic diagram of our proposed method.
(A) The drug-based similarity inference (DBSI), (B) the target-based similarity inference (TBSI) and (C) the unweighted network-based inference (NBI), (D) the edge-weighted NBI (EWNBI) and (E) the node-weighted NBI (NWNBI). Green circle: chemical node, gold square: protein node, black line: unweighted interaction link, cyan line: chemical-chemical two-dimensional structural similarity (Sc) or protein-protein Smith Waterman genomic similarity (Sg), red line: weighted edges (thick red line denotes the strong edge with high potency and thin red line denotes the weak edge with low potency).
Table 1.
Statistics of all known chemical-protein interaction pairs of the training set and validation set used in this study.
Figure 2.
Box plots of compound-compound and protein-protein similarities against compound or protein structure activity-relationship (SAR) similarities.
(A) protein-protein (GPCRs) sequence similarity (Smith-Waterman scores) against GPCRs SAR similarity, (B) protein-protein (kinases) sequence similarities (Smith-Waterman scores) against kinases SAR similarity, (C) compound-compound (GPCR ligands) structural similarities (Tanimoto scores) against the GPCR ligands SAR similarities and (D) compound-compound (kinase ligands) structural similarity (Tanimoto scores) against kinase ligands SAR similarities.
Table 2.
The performance of the test set of GPCRs and kinases using different methods by 10 simulation times test of 10-fold cross validation.
Figure 3.
Recall metric of the parameter β on the node weighted network-based inference method for test set when assessed the top five predicted candidate lists.
The recall reaches its maximum value at about 0.4 and 0.3 for GPCRs (A) and kinases (B), respectively. The error bars denote the standard deviation by 10 times independent simulation test.
Figure 4.
Analysis of the role of weak chemical-protein interactions by exponent λ.
When , it is unweighted NBI method; when
, it is the EWNBI method. When
, it positively
strengthens the weighted value of strong CPI edges, while
positively
strengthens the weighted value of weak CPI edges. Otherwise, a negative
will give the negative effects. The area under receiver operating characteristic curve (AUC) was yielded for test set by simulation 10 times test, the error bar denotes the standard deviation. GPCRs (A) and kinases (B).
Table 3.
The performance of difference inference methods in the external validation set of GPCRs and kinases.
Figure 5.
Discovered chemical-protein interactions (CPI) bipartite networks among 267 FDA approved or experimental drugs and 130 kinases.
Circle and square nodes correspond to drugs and kinases, respectively. A gray line represents the old CPI annotated in the DrugBank and KEGG. The red line represents the predicted CPI. The red arrow line represents the new predicted CPI which is validated by literatures. The size of the drug node is the fraction of the number of targets that the drug linked. The size of the target node is the fraction of the number of drugs that the target linked. Color codes are given in the legend. Drug nodes (circles) are colored according to their Anatomical Therapeutic Chemical Classification. This graph and Figure 6 were prepared by Cytoscape (http://www.cytoscape.org/).
Figure 6.
Discovered chemical-protein interaction (CPI) bipartite network among 139 FDA approved or experimental drugs and 55 GPCRs (Table S4).
Circle and square nodes correspond to drugs and GPCRs, respectively. The definition of nodes and edges were given in the caption of Figure 5.