istar: A Web Platform for Large-Scale Protein-Ligand Docking

Protein-ligand docking is a key computational method in the design of starting points for the drug discovery process. We are motivated by the desire to automate large-scale docking using our popular docking engine idock and thus have developed a publicly-accessible web platform called istar. Without tedious software installation, users can submit jobs using our website. Our istar website supports 1) filtering ligands by desired molecular properties and previewing the number of ligands to dock, 2) monitoring job progress in real time, and 3) visualizing ligand conformations and outputting free energy and ligand efficiency predicted by idock, binding affinity predicted by RF-Score, putative hydrogen bonds, and supplier information for easy purchase, three useful features commonly lacked on other online docking platforms like DOCK Blaster or iScreen. We have collected 17,224,424 ligands from the All Clean subset of the ZINC database, and revamped our docking engine idock to version 2.0, further improving docking speed and accuracy, and integrating RF-Score as an alternative rescoring function. To compare idock 2.0 with the state-of-the-art AutoDock Vina 1.1.2, we have carried out a rescoring benchmark and a redocking benchmark on the 2,897 and 343 protein-ligand complexes of PDBbind v2012 refined set and CSAR NRC HiQ Set 24Sept2010 respectively, and an execution time benchmark on 12 diverse proteins and 3,000 ligands of different molecular weight. Results show that, under various scenarios, idock achieves comparable success rates while outperforming AutoDock Vina in terms of docking speed by at least 8.69 times and at most 37.51 times. When evaluated on the PDBbind v2012 core set, our istar platform combining with RF-Score manages to reproduce Pearson's correlation coefficient and Spearman's correlation coefficient of as high as 0.855 and 0.859 respectively between the experimental binding affinity and the predicted binding affinity of the docked conformation. istar is freely available at http://istar.cse.cuhk.edu.hk/idock.


Introduction
Protein-ligand docking predicts the preferred conformation and binding affinity of a small ligand as non-covalently bound to the specific binding site of a protein. Docking can therefore be used not only to determine whether a ligand binds, but also to understand how it binds. The latter is subsequently important to improve the potency and selectivity of binding. To date, there are hundreds of docking programs [1,2]. The AutoDock series [3][4][5] is the most cited docking software in the research community, with over 5,000 citations according to Google Scholar. AutoDock has contributed to the discovery of several drugs, including the first clinically approved HIV integrase inhibitor [6]. Following its initial release, several parallel implementations were developed using either multithreading or computer cluster [7][8][9].
In 2009, AutoDock Vina [5] was released. As the successor of AutoDock 4 [4], AutoDock Vina significantly improves the average accuracy of the binding mode predictions while running two orders of magnitude faster with multithreading [5]. It was compared to AutoDock 4 on selecting active compounds against HIV protease, and was recommended for docking large molecules [10]. Its functionality of semi-flexible protein docking by enabling flexibility of side-chain residues was evaluated on VEGFR-2 [11].
To further facilitate the usage of AutoDock Vina, auxiliary tools were subsequently developed, including a PyMOL [12] plugin for program settings and visualization [13], a bootable operating system for computer clusters [14], a console application for virtual screening on Windows [15], and a GUI for virtual screening on Windows [16].
In 2011, inspired by AutoDock Vina, we developed idock 1.0 [17], a multithreaded virtual screening tool for flexible ligand docking. idock introduces plenty of innovations, such as caching receptor and grid maps in memory to permit efficient large-scale docking, revised numerical model for much faster energy approximation, and capability of automatic detection of inactive torsions for dimensionality reduction. When benchmarked on docking 10,928 drug-like ligands against HIV reverse transcriptase, idock 1.0 achieved a speedup of 3.3 in terms of CPU time and a speedup of 7.5 in terms of elapsed time on average compared to AutoDock Vina, making idock one of the fastest docking software.
Having released idock, we kept receiving docking requests from our colleagues and collaborators. They are mostly biochemists and pharmacologists, outsourcing the docking research to us after discovering pharmaceutical protein targets for certain diseases of therapeutic interest. Consequently, we had to grab the protein structure, do format conversion, define search space, set up docking parameters, and keep running idock in batch for months. Tedious enough, all the above work was done manually, resulting in very low research productivity. In order to automate large-scale protein-ligand docking using our idock, we have therefore developed a web platform called istar.
A few online docking platforms already exist. DOCK Blaster [18] investigates the feasibility of full automation of protein-ligand docking. It utilizes DOCK [19] as the docking engine and ZINC [20,21] as the ligand database. It also utilizes PocketPickker (CLIPPERS) [22] for binding pocket identification. iScreen [23] is a compacted web server for TCM (Traditional Chinese Medicine) docking and followed by customized de novo drug design. It utilizes PLANTS [24][25][26] as the docking engine and TCM@Taiwan [27] as the ligand database. It also utilizes LEA3D [28] for de novo ligand design. FORECASTER [29] is a web interface consisting of a set of tools for the virtual screening of small molecules binding to biomacromolecules (proteins, receptors, and nucleic acids). It utilizes the flexible-target docking program FITTED [30] as docking engine. Nevertheless, the above platforms neither support fine-grained ligand selection based on molecular properties, nor be able to monitor job progress in real time. They also lack straightforward output of compound suppliers, a hurdle prevent-ing users from purchasing high-rank compounds for further wetlab verification. We aim to address these obstacles on our istar platform. Moreover, we strongly emphasize docking efficiency, which we believe is the most crucial factor for public large-scale docking platforms, so we try every endeavor to optimize our docking engine idock. Furthermore, we adopt the robust RF-Score [31] as a rescoring function for accurate prediction of binding affinity.

Methods
In the following four subsections, we introduce our fast docking engine idock, our accurate rescoring function RF-Score, our modern web platform istar, and our experimental settings.

Docking Engine idock
The input to idock includes a rigid receptor, a set of flexible ligands, and a cubic box, which is used to restrict the conformational space to a particular binding site of the receptor. The output from idock includes predicted conformations and their predicted binding affinity.
idock consists of two core components, a scoring function to predict binding affinity, and an optimization algorithm to explore the conformational space. idock inherits the same scoring function from AutoDock Vina. The idock score is made up of a conformation-dependent part and a conformation-independent part. The conformation-dependent part is a weighted sum of five terms over all the pairs of atoms i and j that can move relative to each other, excluding 1-4 interactions, i.e. atoms separated by three consecutive covalent bonds. The sum is calculated from equations (1) and (2) where t i and t j are the atom types of i and j respectively, and r ij is their interatomic distance with a cutoff at r ij = 8Å . The five terms are calculated from equations (3) to (7) where d ij is the surface distance calculated from equation (8) where R ti and R tj are the Van der Waals radii of t i and t j respectively. All the units are in Å . The first three terms account for steric interactions, the fourth term accounts for hydrophobic effect, and the fifth term accounts for hydrogen bonding. Metal ions are treated as hydrogen bond donors. The weighting coefficients are derived from linear regression on the PDBbind [32,33] v2007 refined set (N = 1,300). The optimization algorithm attempts to find the global minimum of e and other low-scoring conformations, which it then ranks.
Hydrophobic(t i ,t j ,r ij )~1 i fd ij ƒ0:5 1:5{d ij if 0:5vd ij v1:5 0 i fd ij §1:5 HBonding(t i ,t j ,r ij )~1 i fd ij ƒ{0:7 The conformation-dependent part can be seen as the sum of inter-molecular and intra-molecular contributions. Hence equation (1) can be rewritten into equation (9) where e inter is the summation over all the heavy atom pairs between receptor and ligand, and e intra is the summation over all the non 1-4 heavy atom pairs of ligand.
The conformation-independent part penalizes e inter for ligand flexibility. The predicted free energy of the kth conformation for  output, denoted as e , is calculated from equation (10) where k is the subscript for conformation, e k is the conformation-dependent score of the kth conformation calculated from equation (1), e intra,1 is the e intra of the first, i.e. lowest-scoring conformation, N ActTors is the number of active torsions and N InactTors is the number of inactive torsions of the ligand. Note that e intra,1 , rather than e intra,k , is subtracted in order to preserve the ranking.
On one hand, in order to fast evaluate e ij , idock precalculates all its possible values. Note that e ij is essentially a function of three variables, namely t i , t j , and r ij , which have known lower and upper bounds. There are 15 heavy atom types implemented in idock, the pair of t i and t j can thus have 120 ( = 15*16/2) different combinations. Since r ij is cut off at 8Å , idock uniformly samples 16,384 points in the range [0, 8] and precalculates their e ij from equation (2). Subsequently, given a combination of t i , t j and r ij , idock approximates the true value of e ij by table lookup rather than linear interpolation as used in AutoDock Vina.
On the other hand, in order to fast evaluate e inter , idock precalculates all its possible values by building grid maps. A grid map of atom type t is constructed by placing virtual probe atoms of atom type t along the X, Y, Z dimensions of the search box at a certain granularity. The e inter value of these probe atoms are precalculated from equation (2). Subsequently, given a sampled conformation, idock approximates the true values of e inter of ligand heavy atoms by table lookup rather than linear interpolation as used in AutoDock Vina. In fact, when we profiled AutoDock Vina, its linear interpolation of the 8 nearest corner probe atoms turned out to be a performance bottleneck because it involves 8 readings, 12 subtractions, 24 multiplications, and 7 additions. The grid granularity is hard-coded to be a coarse value of 0.375Å in AutoDock Vina, while in idock it is exposed as a program option for users to adjust accordingly and has a default fine value of 0.15625Å .
Likewise in AutoDock Vina, idock also uses Broyden-Fletcher-Goldfarb-Shanno (BFGS) [34] Quasi-Newton method for local optimization. In each BFGS iteration, a conformational mutation and a line search are taken, with each sampled conformation being accepted according to the Metropolis criterion. The number of iterations correlates to the complexity of the ligand regarding number of heavy atoms and number of torsions. BFGS approximates the inverse Hessian matrix, i.e. it uses not only the idock introduces a novel feature that can automatically detect and deactivate certain torsions which are activated in the input file but indeed have no impact on the overall scoring, such as hydroxyl group -OH, amine group -NH 2 or methyl group -CH 3 , because they only rotate the hydrogens and thus have no contributions to the idock score. idock is capable of re-classifying them as inactive torsions during parsing, thus reducing the dimension of variables to optimize in subsequent BFGS iterations.
idock encapsulates many more improvements. Please refer to its change log for a complete list of new features and bugfixes.
Scoring Function RF-Score RF-Score [31] is a member of a new class of scoring functions that use non-parametric machine learning approach to predict binding affinity in an entirely data-driven manner. RF-Score has been rigorously shown [31,35] to perform better than 16 classical scoring functions in ranking protein-ligand complexes according to predicted binding affinity. It has also been shown to be useful in the discovery of new molecular scaffolds in antibacterial hit identification [36].
RF-Score is the first application of Random Forests [37] to predicting protein-ligand binding affinity. In RF-Score, each feature comprises the number of occurrences of a particular protein-ligand atom type pair interacting within a certain distance range. Four common atom types for the protein (i.e. C,N,O,S) and nine common atom types for the ligand (i.e. C,N,O,F,P,S,Cl,Br,I) constitute a vector of 36 features, and the distance cutoff is chosen to be as sufficiently large as 12Å to implicitly capture solvation effects.
The original version of RF-Score [31] is trained on PDBbind v2007 refined set less the core set (N = 1,105). It grows each binary tree using the CART algorithm [38] without pruning from a bootstrap sample of the training data. It selects the best split at each node of the tree from a typically small number of randomly chosen features, and stops splitting a node with no more than 5 samples. The prediction from an individual tree is the arithmetic mean of its samples in the traversed leaf node. The performance of RF-Score does not vary significantly with the number of trees beyond a certain threshold, so we subscribe to the common practice of using 500 as a sufficiently large number of trees. The final prediction is the arithmetic mean of the individual predictions of all the trees in the forest.
We have re-trained the RF-Score on PDBbind v2012 refined set (N = 2,897) for prospective prediction purpose, and integrated it into our istar platform as an alternative option to re-score predicted conformations. We have also implemented a consensus score as the average effect of idock score and RF-Score. Mathematically speaking, equations (11) to (13) relate equilibrium constant K eq and dissociation constant K d with Gibbs free energy G, where R is gas constant (R~1:9858775|10 {3 kcal=mol) and T is absolute temperature.
G~{RT ln K eq ð11Þ K d~1 K eq ð12Þ pK d~{ log 10 K d ð13Þ Assuming T~298:15K at room temperature, plugging equations (12) and (13) into (11) yields Equation (14) transforms the predicted free energy output by idock in kcal=mol into binding affinity in pK d unit. The consensus score is thus defined in equation (15) so that it directly reflects the predicted potency in pK d unit.
Redocking success rates of idock and AuoDock Vina on the PDBbind v2012 refined set (N = 2,897), the PDBbind v2011 refined set (N = 2,455), and the CSAR NRC HiQ Set 24Sept2010 (N = 343) under various conditions regarding the RMSD (Root Mean Square Deviation) values between the crystal and docked conformations. By default, both programs output 9 predicted conformations per ligand. RMSD i (i~1,2,:::,9) refers to the RMSD value between the crystal conformation and the ith docked conformation, i.e. the one with the ith highest predicted binding affinity, while RMSD min refers to the RMSD value between the crystal conformation and the closest docked conformation, i.e. the one with the minimum RMSD value. RMSD min~m in i RMSD i (i~1,2,:::,9 Web Platform istar Figure S1 shows the overall architecture of istar. On our istar website, the first section displays summary of existing jobs and the second section allows new job submission. A job comprises compulsory fields and optional fields. Compulsory fields include a receptor in PDB format, a search space defined by a cubic box, a brief description about the job, and an email to receive completion notification. Optional fields include nine ligand filtering conditions. The nine ligand filtering conditions are molecular weight, partition coefficient xlogP, apolar desolvation, polar desolvation, number of hydrogen bond donors, number of hydrogen bond acceptors, topological polar surface area tPSA, net charge, and number of rotatable bonds. These nine molecular descriptors are directly retrieved from our data source, i.e. the ZINC database [20,21], in which the nine descriptors are already precalculated. Note that although molecular mass in Dalton unit may be a more appropriate descriptor than molecular weight in g/mol unit, we stick to the latter in order to maintain consistency with ZINC, in which the g/mol unit is used for molecular weight. We have collected 17,224,424 ligands at pH 7 in mol2 format from versions 2012-04-26 and 2013-01-10 of the All Clean subset the ZINC database [20,21] with explicit permission of its major developer and maintainer. The All Clean subset is constituted by applying strict filtering rules (http://blaster. docking.org/filtering), e.g. aldehydes and thiols have been removed. We have then converted the entire 17 million ligands in batch into PDBQT format as used by idock and the AutoDock series. The huge number of 17 million ligands should be sufficient in most cases. In case users need to screen their own ligand libraries, at present we recommend them run idock locally on their computers. We may consider allowing users to upload customized ligand libraries under certain constraints in future releases of istar.  istar supports ligand selection by desired molecular properties in a fine-grained manner and previewing the number of ligands to dock in real time ( Figure S2). Users can move the nine sliders to filter ligands in the form of closed intervals. Only the ligands satisfying all the nine filtering conditions will be docked. Because of the relationship of logical and, in order to nullify a specific filtering condition, one may expand its closed interval to cover the entire possible range. We have set up default values of the lower and upper bounds of the nine molecular properties for novices to get started easily.
istar supports monitoring job progress in real time ( Figure S3). We have composed a timer to automatically fetch and report the latest job progress every second without page refresh. Users can thus have a rough estimation in advance of how long the jobs will take and when the jobs will complete. This feature is particularly handy when the jobs are long running, which is usually the case of large-scale docking.
istar outputs verbose information in PDBQT format (Figure S4). The first REMARK line describes the ZINC ID, molecular weight (g/mol), partition coefficient xlogP, apolar desolvation (kcal/mol), polar desolvation (kcal/mol), number of hydrogen bond donors, number of hydrogen bond acceptors, topological polar surface area tPSA ( 2 ), net charge, and number of rotatable bonds of a selected ligand. The second REMARK line describes the SMILES representation. The third REMARK line describes the number of suppliers followed by their names, which conform to the nomenclature as used by ZINC. The subsequent REMARK lines describe the free energy and ligand efficiency predicted by idock, putative hydrogen bonds, binding affinity predicted by RF-Score, and consensus score in pK d or pK i unit. Columns 71 to 76 of the ATOM lines describe the predicted free energy of each atom. The individual atom contribution to the overall score facilitates the detection of protein-ligand interaction hotspots, and thus assists in de novo ligand design.
At the moment, we have deployed a machine with Intel Xeon W3520 @ 2.66 GHz and 8GB DDR3 SDRAM to run the web server, and two identical virtual machines with Intel Xeon E5620 @ 2.40 GHz and 8GB DDR3 SDRAM to run the idock daemons, resulting in a docking speed of about one second per ligand. We have mounted a 2TB hard disk into our network file system to store docking jobs and results.
Datasets. The PDBbind v2012 dataset contains a diverse collection of experimentally determined structures carefully selected from PDB (Protein Data Bank) [41,42]. For each complex, the experimental binding affinity (either dissociation constant K d , inhibition constant K i , or half maximal inhibitory concentration IC 50 ) is manually collected from its primary literature reference, thus resulting in the general set of 9,308 complexes, with 7,121 being protein-ligand complexes. Out of them, the complexes with a resolution of 2.5Å or better, with known K d or K i values, and with ligand containing merely the common heavy atoms (i.e. C, N, O, F, P, S, Cl, Br, I) are filtered to constitute the refined set of 2,897 complexes. These complexes are then clustered by protein sequence similarity using BLAST at a cutoff of 90%, and for each of the 67 resulting clusters with at least five complexes, the three complexes with the highest, median and lowest binding affinity are selected to constitute the core set of 201 complexes, whose experimental binding affinity spans 10 pK d or pK i units.
The CSAR (Community Structure Activity Resource) NRC HiQ Set 24Sept2010 contains 343 diverse protein-ligand complexes selected from existing PDB [41,42] entries which have binding affinity (K d or K i ) in Binding MOAD [43,44], augmented with entries from PDBbind [32,33]. Their binding affinity spans 12 pK d units.
The ZINC database contains over 21 million purchasable small molecules in popular MOL2 and SDF formats.
Benchmarks. In the rescoring benchmark, we evaluated the capability of RF-Score, AutoDock Vina and idock of predicting the binding affinity as close to the experimental binding affinity as possible given a crystal protein-ligand complex. We compared their rescoring performance to 18 other scoring functions on the PDBbind v2007 core set (N = 195). The test set was then extended to two larger datasets, i.e. the PDBbind v2012 [32,33] refined set (N = 2,897) and the CSAR NRC HiQ Set 24Sept2010 (N = 343) [39,40] to enable a more comprehensive comparison.
In the redocking benchmark, we evaluated the capability of AutoDock Vina and idock of docking a randomized ligand conformation back to its crystal conformation as close as possible.
We used the PDBbind v2012 [32,33] refined set (N = 2,897), the PDBbind v2011 refined set (N = 2,455), and the CSAR NRC HiQ Set 24Sept2010 (N = 343) [39,40], because they are the latest versions and contain the largest number of high-quality and diverse protein-ligand structures. We wrote a script to automatically define the search box first by finding the smallest cubic box that covers the entire ligand and then by extending the box by 10Å in all the three dimensions. Note that the 2rio entry of PDBbind contains two strontium ions, which are supported by idock but not by AutoDock Vina, we manually removed them before invoking AutoDock Vina. Both programs were also evaluated on the PDBbind v2012 core set (N = 201) in order to find potential impact factors on their performance. We used root mean square deviation RMSD to measure the closeness between two conformations. The lower the RMSD is, the closer the two conformations are. Usually the RMSD value is calculated between the crystal conformation and the docked conformation. Very often the RMSD of 2.0Å is regarded as the positive control for correct bound structure prediction. In the execution time benchmark, we collected 12 diverse proteins from the PDB (Protein Data Bank) database [41,42], and 1000 ligands with a molecular weight of 200-300 g/mol, 1000 ligands with a molecular weight of 300-400 g/mol, and 1000 ligands with a molecular weight of 400-500 g/mol from the All Clean subset of the ZINC database [20,21]. The 3,000 ligands were docked against the 12 proteins by AutoDock Vina and idock. Since AutoDock Vina can dock only one ligand in a run, three bash scripts containing 1,000 lines were executed instead, with each line being an execution of AutoDock Vina to dock one single ligand. The GNU Time utility v1.7 was used as a profiler.
The three benchmarks were carried out on desktop computers with Intel Core i5-2400 CPU @ 3.10GHz and 4GB DDR3 RAM under Mac OS X 10.7.4 Build 11E53. Arguments to AutoDock Vina and idock were left as default. By default, both programs output at most 9 predicted conformations per ligand. Table 1 compares 21 scoring functions on the PDBbind v2007 core set (N = 195). RF-Score [31], ID-Score [45], SVR-Score [46] and X-Score [47] are the only scoring functions whose training set do not overlap with the PDBbind v2007 core set. In terms of both Pearson's correlation coefficient and standard deviation, RF-Score performed the best, while AutoDock Vina and idock ranked 7th and 8th respectively, already outperforming the majority of commercial scoring functions. Figure 1 plots the pairwise correlations amongst experimental binding affinity and predicted binding affinity by RF-Score, AutoDock Vina and idock on the PDBbind v2012 [32,33] refined set (N = 2,897) as it is the latest version. Since both AutoDock Vina and idock are trained on the PDBbind v2007 refined set (N = 1,300), in order to make a fair comparison, in this benchmark we have re-trained RF-Score on the same training set. On one hand, the re-trained RF-Score managed to predict the binding affinity accurately with Pearson's correlation coefficient R p = 0.765, Spearman's correlation coefficient R s = 0.755, root mean square error RMSE = 1.26, and standard deviation SD = 1.26. On the other hand, although AutoDock Vina and idock claimed to do well in conformation prediction, they could not predict binding affinity very accurately (R p = 0.466, R s = 0.464, RMSE = 1.74, SD = 1.74 for AutoDock Vina, and R p = 0.451, R s = 0.453, RMSE = 1.75, SD = 1.75 for idock), a very common obstacle in the entire computational chemistry community. As expected, the correlation between binding affinity predicted by AutoDock Vina and idock is very close to 1 because of their identical scoring function but different numerical approximation methods [17]. As can be seen from Figure 2, the above observations also apply to the results on the CSAR NRC HiQ Set 24Sept2010 (N = 343) [39,40]. Redocking Benchmark Figures S5, S6, S7 and S8 visualize the redocking results of four protein-ligand complexes selected from the PDBbind v2012 refined set. Table 2 shows the success rates of idock and AutoDock Vina under various conditions regarding the RMSD values between the crystal and docked conformations. Given a redocking case, RMSD i (i~1,2,:::,9) refers to the RMSD value between the crystal conformation and the ith docked conformation, i.e. the one with the ith highest predicted binding affinity, while RMSD min refers to the RMSD value between the crystal conformation and the closest docked conformation, i.e. the one with the minimum RMSD value. RMSD min~m in i RMSD i (i~1,2,:::,9). The condition RMSD 1 = RMSD min therefore tests for how many percent the docked conformation with the highest predicted binding affinity actually turns out to be the closest one among the 9 predicted conformations. It can be seen that the success rates of idock are comparable to, albeit slightly lower than, AutoDock Vina, and the success rates on the CSAR NRC HiQ Set 24Sept2010 are consistently higher than the PDBbind v2012 and v2011 refined sets, probably because the scoring function performs well on carefully refined structures. Using a RMSD value of 2.0Å , a publicly accepted positive control for correct bound structure prediction, both programs managed to predict a conformation sufficiently close to that of the co-crystallized ligand as the first conformation in over half of the cases, without any manual tweaking of the protein model. Figure 3 plots the impact of rotatable bonds of the ligand on the success rates. Both programs tend to do well when the ligand contains fewer than 10 rotatable bonds. Figure 4 plots the impact of metal ions in the binding site on the success rates. Both programs tend to do well when the binding site contains no metal ions.

Rescoring Benchmark
From the perspective of prospective docking, Figure 5 shows the scatter plot of the highest predicted binding affinity of the 9 docked conformations output by idock against the experimental binding affinity. The weak correlation and large deviation (R p = 0.502, R s = 0.530, RMSE = 1.31, SD = 1.32) reflect the limitation of using idock alone as scoring function. After adopting the maximum RF-Score of the 9 docked conformations as predicted binding affinity, the correlation improves ( Figure 6, R p = 0.815, R s = 0.817, RMSE = 0.75, SD = 0.76). Moreover, since for over 50% probability the docked conformation with the highest predicted binding affinity indeed turns out to be the closest to the crystal conformation (i.e. RMSD 1 = RMSD min ), using RF-Score to re-score the conformation with RMSD 1 leads to even better prediction (Figure 7, R p = 0.855, R s = 0.859, RMSE = 0.73, SD = 0.73).

Discussion
Docking is the computational method that investigates how a ligand binds to a protein, and predicts their binding affinity. Hence docking is useful in elaborating inter-molecular interactions and enhancing the potency and selectivity of binding in subsequent phases of the drug discovery process.
In this study, we report a web platform called istar to automate large-scale protein-ligand docking using our popular docking engine idock. Since the initial release of idock, we have been further improving its docking speed and robustness. Compared to AutoDock Vina, our idock features a new numerical model in approximation of the scoring function, replacing slow linear interpolation by fast table lookup. It encapsulates a unique feature that can safely deactivate certain torsions to reduce the dimension of variables. It also implements an efficient thread pool to parallelize multiple components of the program and maintain a high CPU utilization. Results show that idock managed to predict a conformation sufficiently close to that of the co-crystallized ligand as the first conformation in over half of the test cases across a number of diverse datasets, and it outperformed AutoDock Vina by an order of magnitude in terms of docking efficiency at no significant cost of accuracy. It is worthwhile to highlight that in order to use istar, the input protein model requires no manual preprocessing in most cases.
We examine two possible reasons that might cause idock to fail in some test cases. They are the number of rotatable bonds of the ligand ( Figure 3) and the number of metal ions in the binding site ( Figure 4). On one hand, a large number of rotatable bonds implies a high dimension of variables to optimize. idock has a higher chance to succeed when the ligand consists of fewer than 10 rotatable bonds. On the other hand, all kinds of metal ions are simply treated as hydrogen bond donors in the idock score, which might not thoroughly accounts for their solvation effects and other possible interactions. idock has a higher chance to succeed when the binding site consists of no metal ions.
Although idock performs well in conformation prediction, it displays its weakness in binding affinity prediction. In contrast, RF-Score, a new scoring function that circumvents the need for problematic modelling assumptions via non-parametric machine learning, has been recently shown to obtain the best scoring performance among 16 classical scoring functions on PDBbind v2007 core set (N = 195) [31]. We have therefore integrated a revised version of RF-Score as an alternative re-scoring function. We have re-trained RF-Score on the entire PDBbind v2012 refined set (N = 2,897) for prospective prediction purpose. Results show that using RF-Score to re-score the predicted conformations leads to a much better prediction with R p = 0.855, R s = 0.859, RMSE = 0.73, and SD = 0.73. We have successfully demonstrated that RF-Score is a particularly effective re-scoring function for docking purposes.
To compile a more complete list of scoring functions benchmarked on the PDBbind v2007 core set (N = 195) into Table 1, we have extracted the performance results for 19 scoring functions from [31,45,46,48], and reported the results for AutoDock Vina and idock on the same test set in this study. This procedure has a number of advantages. Evaluating all the scoring functions on the same test set under the same conditions guarantees a fair and objective comparison. Using a common existing benchmark can also ensure the optimal application of such functions by their authors and avoid the danger of constructing an in-house benchmark on which unrealistically high performance might be produced. Moreover, future scoring functions can be unambiguously incorporated into this comparative assessment. Notably, the top four scoring functions, namely RF-Score [31], ID-Score [45], SVR-Score [46] and X-Score [47], are the only scoring functions whose training set do not overlap with the PDBbind v2007 core set. The prediction power of RF-Score is already superior to many scoring functions in commercial docking software. In terms of implementation complexity, a descriptor in RF-Score is just the occurrence count of a particular protein-ligand atom type pair interacting within a certain distance range, while a descriptor in ID-Score can be as mathematically demanding as, for instance, calculating the cosine value of the bond angle between a hydrogen bond donor and a hydrogen bond acceptor. This again demonstrates the adaptiveness of RF-Score to various applications.
One may argue that although the scoring functions are evaluated on the same test set, their training sets are not identical. Besides, the PDBbind v2007 core set consists of merely 195 complexes, which might not cover sufficient protein-ligand diversity from the perspective nowadays. To address this issue, we re-trained RF-Score on the PDBbind v2007 refined set (N = 1,300), on which AutoDock Vina and idock were also trained, and we expanded the test set to the much larger PDBbind v2012 refined set (N = 2,897). The results of Figure 1 show that all the performance gain (R p = 0.765, R s = 0.755, RMSE = 1.26, SD = 1.26 for RF-Score versus R p = 0.451, R s = 0.453, RMSE = 1.75, SD = 1.75 for idock) is guaranteed to come from the scoring function characteristics, ruling out any influence of using different training sets on performance.
To design the istar platform in a user-friendly way, we have utilized state-of-the-art web and database technologies. On istar, there are over 17 million ready-to-dock ligands collected from ZINC [20,21]. These ligands come with supplier information for easy purchase, and they can be filtered by nine molecular properties in a fine-grained manner. The number of ligands to dock can also be previewed in real time. The jobs are transparently split into slices for parallel docking across multiple workstations, and the job progress can be monitored in real time in a browser so that users can have a rough estimation of how long the job will take and when the job will complete. Additionally, our web server supports REST API so that developers can easily submit multiple jobs in batch. Automation is the major reason of submitting jobs to istar instead of running idock locally on one's computer. With istar at hand, users need not to write special scripts to fetch ligands from some sources, to implement parallelism, or to invoke RF-Score externally by themselves. Users can therefore concentrate on the docking results and subsequent analysis rather than the docking process itself.
We compare our istar to DOCK Blaster [18], an expert system created to investigate the feasibility of full automation of largescale protein-ligand docking. It uses DOCK [19] as the docking engine and ZINC [20,21] as the ligand repository. Although DOCK is open source, DOCK Blaster itself is not open source. istar is indeed much easier to use than DOCK Blaster. Given the structure of a target protein, both istar and DOCK Blaster can dock and score a large set of ligands against the target protein and provide a ranked list which users may review and prioritize for purchase and wet-lab testing. From the perspective of binding site indication, istar automatically detects a site from the cocrystallized ligand, while DOCK Blaster makes use of Pock-etPickker (CLIPPERS) [22]. From the perspective of ligand selection, istar features ligand filtering by nine desired molecular properties in a fine-grained fashion, while DOCK Blaster predefines several subsets either by property, by vendor, or by user. From the perspective of documentation and user manual, the istar website presents a graphical tutorial on how to submit a new job, while DOCK Blaster deploys a wiki with very rich contents covering all the procedures of DOCK Blaster. As extra features, DOCK Blaster allows the input of known active and inactive binders as heuristic information for docking. In summary, although istar and DOCK Blaster share the identical motivation of automating large-scale protein-ligand docking, their internal implementations and methodologies differ greatly. Users are encouraged to utilize both istar and DOCK Blaster to reach a consensus of promising candidate ligands for purchase.
Due to limited budget, we cannot offer as much hardware resource as DOCK Blaster (i.e. 700 CPU cores plus 20TB RAID-6 storage). However, we emphasize full reproducibility and we have released istar under a permissive open source license so that anyone who possesses sufficient hardware resource is welcome to deploy a copy of istar to his/her own infrastructure with no charge.

Availability
We emphasize full reproducibility. Both idock and istar are free and open source under Apache License 2.0. For idock, its C++ source code, precompiled executables for 32-bit and 64-bit Linux, Windows, Mac OS X, FreeBSD and Solaris, 13 docking examples, and a doxygen file for generating API documentations are available at https://github.com/HongjianLi/idock. For istar, its C++ and JavaScript source code and REST API documentation are available at https://github.com/HongjianLi/istar. Our istar website is running at http://istar.cse.cuhk.edu.hk/idock. It has been tested successfully in Chrome 30, Firefox 25, Safari 6.1 and Opera 17. Support for IE 11 is experimental. Figure S1 The overall architecture of istar. (PNG) Figure S2 istar supports filtering ligands with molecular properties in a fine-grained manner and previewing the number of ligands to dock in real time.