Large-Scale Off-Target Identification Using Fast and Accurate Dual Regularized One-Class Collaborative Filtering and Its Application to Drug Repurposing

doi:10.1371/journal.pcbi.1005135

Table 1.

The symbols and the descriptions for numerical calculations

More »

Expand

Fig 1.

The overall process of REMAP. The rectangular boxes with capitalized symbols are matrices, and the smaller boxes and ovals are chemicals and proteins, respectively, in the simplified network representation (top-left corner).

Solid lines within the network represent connectivity (edges), and the arrows represent mathematical processes. Red squares represent single similarity values, and blue bars in U and V represent row and column vectors. Lower-case c and p represents chemicals and proteins, respectively. The letter symbols are annotated in Table 1.

More »

Expand

Fig 2.

(A) REMAP score distributions for active (blue), inactive (orange), and ambiguous (green) pairs. For each bin of raw prediction scores (x-axis, bin width = 0.05), the number of pairs found in the bin was divided by the total of the type of data (total numbers in the plot). Raw prediction scores over 1.10 were regarded as outliers and not included in the figure. Active pairs were obtained from the ZINC and the ChEMBL databases, and inactive, and ambiguous pairs were obtained from the ChEMBL database. (B) Adjusted scores for each bin of raw prediction scores (x-axis, same bin width as A). Adjustment by the counts only (blue) and adjustment with weighted counts (orange). A weight of 5.25 was given for the counts of inactive pairs as explained in the prediction score adjustment section.

More »

Expand

Fig 3.

Performance comparison for REMAP (green), PRW (blue), and NRLMF (orange).

NT2 (2 known targets per chemical) datasets used for varying number of ligands (A) and chemical structural similarity (B). Performance measurement explained in the measuring prediction accuracy of REMAP by TPR vs. cutoff rank section. (A) Performance comparison on the datasets with varying number of ligands per protein. For example, the x-axis of L11to15 means that the proteins of interest have between 11 and 15 known chemicals to bind. (B) Performance comparison on the datasets with the ranges of chemical structural similarity of the tested chemicals to the trained chemicals. For instance, the x-axis of Tc0.6to0.7 means that for the tested chemicals, at least one trained chemical was found such that and no trained chemical was found in greater similarity than 0.7. All TPR values are based on 10-fold cross validation. Error bars represents s.e.m. Asterisks represents statistical significance based one t-test of the 10 TPR values (* for p < 0.05, ** for p < 0.001).

More »

Expand

Fig 4.

Performance comparison for REMAP (green), PRW (blue), and NRLMF (orange).

NT3 (3 or more known targets per chemical) datasets used for varying number of ligands (A) and chemical structural similarity (B). Performance measurement explained in the measuring prediction accuracy of REMAP by TPR vs. cutoff rank section. (A) Performance comparison on the datasets with varying number of ligands per protein. For example, the x-axis of L21more means that the proteins of interest have 21 or more known chemicals to bind. (B) Performance comparison on the datasets with the ranges of chemical structural similarity of the tested chemicals to the trained chemicals. For instance, the x-axis of Tc0.5to0.6 means that for the tested chemicals, at least one trained chemical was found such that and no trained chemical was found in greater similarity than 0.6. All TPR values are based on 10-fold cross validation. Error bars represents s.e.m. Asterisks represents statistical significance based one t-test of the 10 TPR values (* for p < 0.05, ** for p < 0.001).

More »

Expand

Fig 5.

Performance of REMAP according to the amount of the chemical-chemical or the protein-protein similarity information used for its 10-fold cross validation on the ZINC dataset.

(A) True Positive Rate at the given cutoff rank. All available chemical and protein similarity information included (blue), a half of chemical-chemical similarity was ignored (orange), and the entire chemical-chemical similarity was ignored (green). (B) The blue line is the same as A. A half of protein-protein similarity matrix was ignored (gray), and the entire protein-protein similarity was ignored (red).

More »

Expand

Fig 6.

Performance of REMAP according to the importance parameters for the chemical-chemical (p_chem) or the protein-protein (p_prot) similarity information used for its 10-fold cross validation on the ZINC dataset.

(A) The chemical-chemical similarity importance parameter, p_chem, was controlled while p_prot = 0.1 fixed. (B) The protein-protein similarity importance parameter, p_prot, was controlled while p_chem = 0.1 fixed.

More »

Expand

Fig 7.

Average running times of REMAP using a single core node with 2.88 GB of memory. All running times are in seconds.

(A) Average running times on the ZINC dataset (12,384 chemicals and 3,500 proteins) according to the low-rank (r). The linear fit with R² = 0.9856 (orange line). (B) Average running times according to the number of proteins (columns) from 1,000 to 20,000. The number of chemicals (rows) were fixed to 200,000. Error bars represent s.e.m., with n ≥ 15 for (A) and n ≥ 30 for (B).

More »

Expand

Table 2.

The known uses and target information for the anti-cancer drug cluster in Fig 8B obtained from DrugBank.

The known targets are in UniProt Accession. The target information from UniProt is in S1 Table.

More »

Expand

Fig 8.

(A) The drug clusters created based on the profile similarity with the anti-cancer drug cluster in the middle (darker blue grid). (B) The clusters of FDA-approved anti-cancer drugs. A set of 25 known anti-cancer drugs (blue boxes), and another set of 7 FDA-approved drugs that are closely linked to the former set but have not yet been approved for anti-cancer treatment (darker blue boxes). Procedures explained in the drug-target interaction profile analysis for drug repurposing section.

More »

Expand