Drug-Target Interaction prediction using Multi Graph Regularized Nuclear Norm Minimization

The identification of interactions between drugs and target proteins is crucial in pharmaceutical sciences. The experimental validation of interactions in genomic drug discovery is laborious and expensive; hence, there is a need for efficient and accurate in-silico techniques which can predict potential drug-target interactions to narrow down the search space for experimental verification. In this work, we propose a new framework, namely, Multi Graph Regularized Nuclear Norm Minimization, which predicts the interactions between drugs and proteins from three inputs: known drug-target interaction network, similarities over drugs and those over targets. The proposed method focuses on finding a low-rank interaction matrix that is structured by the proximities of drugs and targets encoded by graphs. Previous works on Drug Target Interaction (DTI) prediction have shown that incorporating drug and target similarities helps in learning the data manifold better by preserving the local geometries of the original data. But, there is no clear consensus on which kind and what combination of similarities would best assist the prediction task. Hence, we propose to use various multiple drug-drug similarities and target-target similarities as multiple graph Laplacian (over drugs/targets) regularization terms to capture the proximities exhaustively. Extensive cross-validation experiments on four benchmark datasets using standard evaluation metrics (AUPR and AUC) show that the proposed algorithm improves the predictive performance and outperforms recent state-of-the-art computational methods by a large margin. Author summary This work introduces a computational approach, namely Multi-Graph Regularized Nuclear Norm Minimization (MGRNNM), to predict potential interactions between drugs and targets. The novelty of MGRNNM lies in structuring drug-target interactions by multiple proximities of drugs and targets. There have been previous works which have graph regularized Matrix factorization and Matrix completion algorithms to incorporate the standard chemical structure drug similarity and genomic sequence target protein similarity, respectively. We introduce multiple drug-graph laplacian and target-graph laplacian regularization terms to the standard matrix completion framework to predict the missing values in the interaction matrix. The graph Laplacian terms are constructed from various kinds and combinations of similarities over drugs and targets (computed from the interaction matrix itself). In addition to this, we further improve the prediction accuracy by sparsifying the drug and target similarity matrices, respectively. For performance evaluation, we conducted extensive experiments on four benchmark datasets. The experimental results demonstrated that MGRNNM clearly outperforms recent state-of-the-art methods under three different cross-validation settings, in terms of the area under the ROC curve (AUC) and the area under the precision-recall curve (AUPR).

The field of drug discovery in Pharmaceutical sciences is plagued with the problem of 2 high attrition rate. The task is to find effective interactions between chemical 3 October 29, 2018 1/14 compounds (drugs) and amino-acid sequences/ proteins (targets). This is traditionally 4 done through wet-lab experiments which are known to be costly and laborious. An 5 effective and appropriate alternative to avoid costly failures is to computationally 6 predict the interaction probability. A lot of algorithms have been proposed for DTI 7 (Drug-target interaction) prediction in recent years [1,2], which use small number of 8 experimentally validated interactions in existing databases such as ChEMBL [3], 9 DrugBank [4], KEGG DRUG [5], STITCH [6] and SuperTarget [7]. Identification of 10 drug-target pairs leads to improvements in different research areas such as drug 11 discovery, drug repositioning, polypharmacology, drug resistance and side-effect 12 prediction [8]. For instance, Drug repositioning [9,10] (reuse of existing drugs for new 13 indications) may contribute to its polypharmacology (i.e. having multiple therapeutic 14 effects). One of the many successfully repositioned drugs is Gleevec (imatinib mesylate). 15 It was originally thought to interact only with the Bcr-Abl fusion gene associated with 16 leukemia but later, it was found to also interact with PDGF and KIT, eventually 17 leading it to be repositioned to treat gastrointestinal stromal tumors as well [11,12]. 18 There are three major classes of computational methods for predicting DTI: Ligand 19 based approaches, Docking based approaches and Chemogenomic approaches. Ligand 20 based approaches leverage the similarity between target proteins' ligands to predict 21 interactions [13]. These approaches use the fact that similar molecules tend to share 22 similar properties and usually bind similar proteins [14]. However, lack of known ligands 23 per protein in some cases might compromise the reliability of results. Docking based 24 approaches are well-accepted and utilize the 3D structure information of a target 25 protein and a drug; and then run a simulation to estimate the likelihood that they will 26 interact or not [15][16][17]. But docking is heavily time consuming and cannot be applied to 27 protein families for which the 3D structure is difficult to predict or is unavailable [18] 28 for example the G-protein coupled receptors (GPCRs). 29 Chemogenomic approaches overcome the challenges of traditional methods and thus, 30 have recently gained much attention. The approaches under this category work with 31 widely abundant biological data, publicly available in existing online databases and 32 process information (chemical structure graphs and genomic sequences for the drugs 33 and targets) from both the drug and target sides simultaneously for the prediction task. 34 These approaches can further sub-classified based on the representation of the input 35 data: Feature based methods and Similarity based methods. Feature based techniques 36 are machine learning methods, which take their inputs in the form of feature vectors, 37 representing a set of instances (i.e. drug-target pairs) along with their corresponding 38 class labels (i.e. binary values indicating whether or not an interaction exists). 39 Examples of typical feature based methods include Decision Tree (DT), Random Forest 40 (RF) [25] and Support Vector Machines (SVM) to build classification models based on 41 the labeled feature vectors [19]. Positive instances are the known interactions and 42 negative instances, the non-interactions. It should be noted that negative instances here 43 include both non-interactions and unknown drug-target interactions (false negatives). 44 The other category of chemogenomic techniques, Similarity based methods, use two 45 similarity matrices corresponding to drug and target similarity, respectively, along with 46 an interaction matrix which indicates which pairs of drugs and targets interact. 47 Let us discuss the similarity between the said DTI problem and the problem of 48 collaborative filtering (CF). CF is a standard problem in information retrieval. It is 49 used in recommendations systems (e.g. in Netflix movie recommendations and Amazon 50 product recommendations). There is a database of user's and their ratings on items 51 (movies, products, etc.). Obviously, not all the ratings are available; users typically rate 52 only a small subset of items. The objective is to estimate the ratings of all the users on 53 all the items. If that can be done accurately, recommendation accuracy increases. The 54 similarity between DTI and CF should be straightforward now; the drugs play the role 55 October 29, 2018 2/14 of users and the targets play the role of items. The interactions are similar to the 56 ratings. Over the years, many approaches originally developed for CF have been 57 leveraged to solve the DTI problems. In both CF [20] and DTI [21][22][23], the initial 58 techniques were based on simple neighborhood based models. In order to predict the 59 interaction of a (active) drug on a target, the first step is to find out similar (neighbor) 60 drugs by computing some kind of a similarity score. Once the neighborhood is obtained, 61 the interaction value from the drugs in the neighborhood are weighted (by the 62 normalized similarity score) to interpolate the interaction of the active drug on the 63 target. The second approach was based on bipartite local models. In such models, a 64 local model is built for every drug and target. For example in [24] an SVM was trained 65 for each to predict interaction of each drug on all targets and each target on all drugs.

66
Finally, the decision from the two were fused. This is just an example, there are other 67 techniques falling under this generic approach like [25,26]. The third category is based 68 on network diffusion models. One technique for DTI prediction based on such models is 69 based on random walk on the network with a predefined transition matrix [27]. Another 70 work falling under this category, predicts interactions by finding simple (without loops) 71 between nodes of the network. The fourth approach is based on matrix factorization.

72
These techniques were originally developed for collaborative filtering [28]. It is assumed 73 that the drugs and targets are characterized by latent factors. The probability of 74 interaction is high when the latent factors match; i.e. when the inner product has a 75 high value. Therefore, it is logical to express the interaction matrix as a (inner) product 76 of drug and target latent factors. This allows matrix factorization (and its variants) to 77 be applied [29,30]. The fifth and final approach is based on classification. The chemical 78 / biological information is used to generate features for drugs and targets individually.

79
The two features are then concatenated and the corresponding interaction is assumed to 80 the class corresponding to this feature. Any standard classifier can be used for the final 81 classification. In such class of techniques the emphasis is on different feature selection 82 mechanisms [31,32].

83
In a very recent review paper [2] it was empirically shown that matrix factorization 84 based techniques yields by far the best results. The fundamental assumption behind 85 matrix factorization to work is that there are very few (latent) factors that are 86 responsible for drug target interactions. This is reason, one can factor the DTI matrix 87 into a tall (drug) latent factor matrix and a fat (target) latent factor matrix.

88
Mathematically speaking, the assumption is that the DTI matrix is of low-rank. Matrix 89 factorization is being used to model low-rank matrices for the past two decades since 90 the publication of Lee and Seung's seminal paper [33]. However, matrix factorization is 91 a bi-linear non-convex problem; there are no convergence guarantees. In order to 92 ameliorate this problem, mathematicians proposed an alternate approach based on 93 nuclear norm minimization [34][35][36]. The nuclear norm is the closest convex surrogate to 94 the rank minimization (known to be NP hard) problem and there are provable 95 mathematical guarantees on its equivalence to rank minimization. To the best of our 96 knowledge, studies in DTI have been based on matrix factorization techniques; there 97 have been no work on nuclear norm based matrix completion. This would be the first 98 work to model DTI on these lines. However the standard matrix factorization technique, 99 is unable to incorporate similarity information of the drugs and the targets. In a recent 100 study [GRMF] it was shown that the best results are obtained when matrix 101 factorization incorporates graph regularization. Also, graph regularization [37] has been 102 shown to prevent overfitting while learning the data manifold and hence improve the 103 accuracies in various frameworks, like content based filtering, dimensionality 104 reduction [38,39], clustering [40,41], semi-supervised learning [42], etc 105 Following the same lines, we propose to regularize the standard nuclear norm   [7] and DrugBank [4].

115
The data gathered from these databases is formatted as an adjacency matrix, called 116 interaction matrix between drugs and targets, encoding the interaction between n drugs 117 and m targets as 1 if the drug d i and target t j are known to interact and 0, otherwise. 118 Along with the interaction matrix, drug similarity matrix S d and a target similarity 119 matrix St are also provided. In S d , each entry represents the pairwise similarity between 120 the drugs and is measured using SIMCOMP [45]. In S t , the similarity score between 121 two proteins is based on the amino acid sequences of target protein and is computed 122 using normalized Smith-Waterman [46]. 123 Table 1 summarizes the statistics of all four datasets.

125
Let us assume that X is the adjacency matrix where each entry denotes interaction 126 between a drug and target (1 if they interact, 0 otherwise). Unfortunately, we only 127 observe this matrix partially because all interactions are not known. If Y denotes the 128 partially observed adjacency matrix, the mathematical relation between X and Y is 129 expressed as: Here A is the sub-sampling operator. It is a binary mask that has 0's where the 131 interaction X has not been observed or is unknown and 1's where they have been. Our 132 problem is to recover X, given the observations Y , and the sub-sampling mask A. It is 133 known that X is of low-rank. Ideally, X should be recovered by (2). Unfortunately, rank 134 minimization is an NP hard problem with doubly exponential complexity, therefore 135 solving it directly is not feasible.
136 min X rank(X) such that Y=A(X) Traditionally a low rank matrix has been modelled as a product of a thin and a fat 137 matrix and recovered using Matrix Factorization techniques [33]. But, Matrix

138
Factorization is a bi-linear non-convex problem, therefore there is no guarantee for 139 global convergence. In the past decade, mathematicians showed that the rank 140 minimization problem can be relaxed by its convex surrogate (nuclear norm 141 minimization) with provable guarantees [34,35] This turns out to be a convex problem 142 Here the nuclear norm is defined as the sum of singular values of data matrix X. It 145 is the l1 norm of the vector of singular values of X and is the tightest convex relaxation 146 of the rank of matrix, and therefore its ideal replacement.

5:
Compute SVD of B : B k = U SV T

153
Low rank Matrix Completion is not our contribution, it has been around since the past 154 decade. However, it has not been used for Drug target interaction. The problem with 155 Nuclear norm minimization (NNM) is that it cannot accommodate associated 156 information such as Similarity matrices for Drugs and targets (S d and S t ). But it has 157 been seen in prior Matrix factorization based approaches, that incorporating the 158 similarity information is crucial for improving the DTI prediction results. Therefore, its 159 imperative that NNM should also be capable of accommodating such similarity 160 information. To achieve this, we propose DGRMC.

161
Graph regularization assumes that points close to each other in the original space 162 should also be close to each other in the learned manifold (Local Invariance 163 assumption). Regularized version allows the algorithm to learn manifolds for the drug 164 and target spaces in which the data is assumed to lie.

165
The doubly graph regularized version of matrix completion, aims to prevent over 166 fitting and enhance generalizing capabilities. It is incorporated into that formulation as 167 Laplacian weights corresponding to drugs and targets: where λ ≥ 0, µ 1 ≥ 0 and µ 2 ≥ 0 are parameters balancing the reconstruction error of 169 NNM in the first two terms and graph regularization in the last two terms, T r(.) is the 170 trace of a matrix, L d = D d − S d and L t = D t − S t are the graph Laplacians [48] for S d 171 (drug similarity matrix) and S t (target similarity matrix), respectively, and D ii d = Σ j S ij d 172 and D ii t = Σ j S ij t are degree matrices.

173
Poblem (5) is solved using a variable splitting approach [49]. The augmented 174 Lagrangian is expressed as (6). We introduce two new proxy variables Z and Y such 175 that Z T = X and Y = X.
To solve for Y and Z, we differentiate (8) and (9) wrt Y and Z, respectiveley.
Since L t is a symmetric matrix, L t = L t . So, Equating the derivative to zero, we get: The pseudo-code for DGRMC has been given in algorithm 2. The standard NNM is 190 a convex problem and graph regularization penalties are also convex, so entire 191 formulation (5) is convex, therefore it is bound to converge to a global minima.

193
The algorithm is iterative, so we only discuss the time complexity per iteration. In each 194 iteration, we solve two sylvester equations and one NNM. NNM itself is an iterative 195 algorithm that requires solving Singular value decomposition in each iteration, the order 196 of complexity is O(n 3 ). The complexity of solving sylvester equationis O(n b .log(n)) [52] 197 where b is between 2 and 3. interaction information is available.

219
• CVS3: entire target profiles are left out to be used as test set. It tests the 220 algorithm's ability to predict interactions for novel targets.

221
In 10-fold CV, data was divided into 10 folds and out of those 10 folds, one was left 222 out as the test set while the remaining 9 folds were treated as the training set. As the 223 evaluation metrics, We used: thresholds. One way of interpreting AUC is as the probability that the model 229 ranks a random positive example more highly than a random negative example.

230
The higher it is, the better the model is. Preprocessing:

243
Each of the drug an target similarty matrices S d and S t have been sparsified by using 244 p-nearest neighbour graph approach as introduced in [30].
where N p(i) is the set of p nearest neighbors to drug d i . Similarity matrix is then 246 sparsified by element-wise multiplying it with N ij . Also, instead of graph laplacians (L d 247 and L t ), we have used normalized graph laplacians (D as normalized graph Laplacians are known to perform better in many cases [42]. For setting the parameters of our algorithm, we performed cross validation on the 251 training set on the parameters p, λ, µ 1 , µ 2 , ν 1 , ν 2 to find the best parameter combination 252 for each dataset, under each cross-validation setting. For the other methods, we to set 253 the parameters to their optimal (which were found to be already optimal) in [1].  Drug-target interaction prediction is a crucial task in in genomic drug discovery. Many 262 computational techniques have been proposed in literature. In this work, we presented a 263 novel chemogenomic approach for predicting the drug-target interactions, DGRMC 264 (Doubly Graph regularized Matrix Completion). It is a graph regularized version of the 265 tradition Matrix completion algorithm. The algorithm is generic and can be used for 266 prediction in protein-protein interaction [57], RNA-RNA interaction [58], etc.

267
Evaluation was performed using three different cross validation settings, namely 268 CVS1 (random drug-target pairs left out), CVS2 (entire drug profile left out) and CVS3 269 (entire target profile left out) to compare our method with 5 other state-of-the-art 270 methods (three specifically designed for DTI prediction). In most of the cases, our 271 algorithm shows the best performance, outperforming the baselines, while in rest of the 272 cases, the results are competetive and favourable for its practical usage.