Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Towards a critical evaluation of an empirical and volume-based solvation function for ligand docking

Towards a critical evaluation of an empirical and volume-based solvation function for ligand docking

  • Heloisa S. Muniz, 
  • Alessandro S. Nascimento


Molecular docking is an important tool for the discovery of new biologically active molecules given that the receptor structure is known. An excellent environment for the development of new methods and improvement of the current methods is being provided by the rapid growth in the number of proteins with known structure. The evaluation of the solvation energies outstands among the challenges for the modeling of the receptor-ligand interactions, especially in the context of molecular docking where a fast, though accurate, evaluation is ought to be achieved. Here we evaluated a variation of the desolvation energy model proposed by Stouten (Stouten P.F.W. et al, Molecular Simulation, 1993, 10: 97–120), or SV model. The SV model showed a linear correlation with experimentally determined solvation energies, as available in the database FreeSolv. However, when used in retrospective docking simulations using the benchmarks DUD, charged-matched DUD and DUD-Enhanced, the SV model resulted in poorer enrichments when compared to a pure force field model with no correction for solvation effects. The data provided here is consistent with other empirical solvation models employed in the context of molecular docking and indicates that a good model to account for solvent effects is still a goal to achieve. On the other hand, despite the inability to improve the enrichment of retrospective simulations, the SV solvation model showed an interesting ability to reduce the number of molecules with net charge -2 and -3 e among the top-scored molecules in a prospective test.


The increased number of solved protein structures provide a unique opportunity for the application of the so-called ‘structure-based’ methods for the development of new chemical entities able to regulate biological systems [1]. Additionally, since many structures have ligands bound to macromolecules, about 73% according to a naïve search in the PDB website [2], there is a favorable situation for the development of novel approaches and optimization of the current models for protein-ligand interaction [3,4].

The thermodynamics of ligand binding is based on the equilibrium between a receptor R and a ligand L, forming a complex RL: R + LRL. The most important thermodynamic quantity associated with the measurement of this interaction is the change in the free energy of the system [5]. This quantity can be evaluated using computationally expensive methods such as free energy perturbation (FEP) or thermodynamic integration (TI) [6]. On the other hand, inexpensive methods are available nowadays to rapidly score molecular interactions. In this scenario, the potential energies associated to binding of a ligand can be readily computed and used as an indicator of the strength of binding.

One of the most employed models for scoring protein-ligand interactions in terms of molecular modeling involves the use of an interaction potential energy as computed by force fields such as CHARMM, AMBER, OPLS or GROMOS [7], among others. For a rigid receptor, the interaction energy is typically reduced to a Lennard-Jones potential, modeling the van der Waals interaction among ligand and receptor atoms, and a Coulomb potential, modeling the polar atomic interactions. As previously observed, the quantitative description of the intermolecular interaction energy by these two terms is rather poor, since there is no correction for the solvent effects, the conformational entropy is not taken into account and the approximation of a rigid receptor ignoring any induced fit effect may be too simple [8]. Although the conformation entropy is actually a difficult quantity to estimate in molecular modeling [9], the solvent effect can be taken in account using higher levels of theory contributing to a better description of the interaction energy [8].

Many of the current models used for modeling the solvent effect (solvation models) are based on a penalty for polar interactions based on the solvent occluded volume after binding [10,11]. Stouten [12] and Luty [13] described an effective empirical solvation model where a distance-dependent Gaussian weight is applied to the solvent occluded volume resulting in an atom ‘occupancy’ parameter. This atom occupancy is then multiplied by an atomic solvation parameter, resulting in the solvation energy for that atom. Those authors also determined the solvation parameters for the atom types typically observed in macromolecules [12]. This method has some advantages for scoring molecular interactions, especially in the context of ligand docking, including the rapid evaluation of the solvent effect in a volume-based approach and the possibility of a pre-computation of the solvation terms in grids speeding-up docking computation [13,14].

A continuum representation of the solvation effect as a dielectric medium can be best described by the Poisson-Boltzmann (PB) model, where meaningful solvation energies can be computed using a high level of theory. However, PB involves a numerical solution of differential equations, requiring an increased computing power [15]. A feasible alternative to PB is the Generalized Born (GB) model, where the Born model for solvation of an ion is extended to molecules of any shape. According to Still approximation [16], the polar contribution to the solvation free energy can be given in GB by: (1) where qi and qj are partial charges, ε is the solvent dielectric constant and fGB is a function of rij and the Born radius [16]. Compared to PB, GB shows an increased efficiency in terms of computation speed and still preserves a linear relation with the former [17].

Despite their efficiency, the rigorous solvation models, such as PB or GB, are still too expensive to be used in the current ligand docking models, where a fast, though accurate, energy evaluation is ought to be achieved for screening a large number of compounds in purchasable screening libraries [18]. Regarding the solvation effect, a compromise has to be established between a rigorous evaluation using a higher theory level and a fast assessment using an empirical model.

Here we evaluated a variation of the Stouten solvation model, as proposed by Verkhivker and coworkers [19] and compared the correlation of solvation energies in this empirical model to experimental solvation energies. This proposed model was found to be in close agreement with experimentally determined solvation energies. Furthermore, we evaluated the model in the context of ligand docking with the software LiBELa [20] and, surprisingly, we found that the introduction of the correction term for solvent effect does not significantly improve the enrichment of binders against decoy molecules. However, when used in prospective screening of small molecule binders, the model was found to correct the overvaluation of the electrostatic term in the binding energy that typically favors highly charged molecules.


The Stouten-Verkhivker model, or SV model, is based on two very simple assumptions. First, the electrostatic contribution to the solvation energy can be described by a linear function of the square of atomic charges of the molecule, in a classical approach. Thus, an atomic solvation parameter S can be defined as (2) where α and β are adjustable parameters and qi is the charge of atom i. Second, the degree of desolvation of a ligand atom by receptor atoms depends on the fragmental volume of the receptor atom and the distance between them, where a Gaussian weight is used for distances: (3) where a is the atomic radius of the atom j, rij is the distance between atom i and atom j and σ is a constant. So, similarly to the proposal of Verkhivker [19] and coworkers, the pairwise desolvation energy upon ligand-receptor interaction is given by: (4) where the first term in the summation accounts for the desolvation of a ligand atom i upon binding a receptor atom j and the second term accounts for the desolvation of receptor atom j. By summing the contribution of each receptor atom to the desolvation of each ligand atom, the ligand molecular solvation energy is computed. The same holds true for the receptor desolvation.

The proposed model has at least three interesting properties in the context of ligand docking calculations: i) first, it is simple and depends on just a few parameters, already used in typical docking calculation, i.e., atomic charge and atomic radius, and does not require the parametrization of new force field parameters; ii) it is suitable for calculations in grids [13], which is typical in ligand docking calculations [14], speeding up the actual computation of optimal ligand poses; iii) although simple, the model still preserves a dependence with the square of the atomic charge, similar to what is observed in the Born model for hydration of an ion. So, given these properties of the SV model, we then decide to evaluate the model in the context of ligand docking and check if enrichment is improved in typical benchmarks.

Solvation energies

In the context of ligand docking, any proposed solvation model should have a balance between accuracy, i.e., the ability to compute desolvation energies correlated with experimental solvation energies, and computational efficiency. In an attempt to test the ability of the proposed SV model to reproduce solvation energies, we used the dataset FreeSolv [21], which is a curated database of solvation free energies for small molecules. Since the SV model is pairwise model accounting for the desolvation energy due to the interaction between a ligand and a receptor, we computed the solvation term (S in Eqs 2 and 4), assuming that it should be proportional to the solvation energy of the ligand. For this comparison, the atomic charges of the molecules available in the FreeSolv database were recomputed using three different charge models, namely AM1, as available in the program ANTECHAMBER [22], MMF94 and Gasteiger-Marsili model, as available in the program OpenBabel [23].

The solvation energies as computed using a linear dependency with the square of the atomic charge showed a good correlation with the experimentally determined solvation free energies, as shown in Fig 1 and Table 1. From the three atomic charge models evaluated here, the Gaisteger charge model resulted in the strongest correlation with r = 0.71 when α was set to 0.3 kcal.mol-1.e-2). For MMFF94 and correlation coefficient r = 0.55 was observed (α = 0.3 kcal.mol-1.e-2) and, for the AM1 charge model, originally used in the FreeSolv database, an r = 0.65 was observed under the same conditions.

Fig 1. Correlation between experimental and computed solvation energies.

Experimentally determined solvation energies for 504 organic compounds and available in the FreeSolv database (vertical axis) and solvation energies computed using the term S in our implementation of the SV model are compared. (A) comparison using α = 0.1 kcal.mol-1.e-2 and (B) α = 0.3 kcal.mol-1.e-2.

Table 1. Correlation coefficients computed between empirical SV solvation energies and experimental solvation free energies available in the FreeSolv database.

N = 504.

Decreasing the coefficient α results in slightly decrease in the correlation with experimental solvation energies, as shown in Table 1, and setting it above 0.3 kcal.mol-1.e-2 does not result in improved correlation, suggesting that its optimal value is found around 0.3 kcal.mol-1.e-2.It is remarkable that a simple and fast model can result in a linear response for solvation energies when compared to experimental energies. The correlation between the computed energies is good enough for a solvation model in the context of molecular docking.

Scoring efficiency

The encouraging correlation between desolvation energies computed with SV model and experimental data prompted us to test this model in retrospective docking simulations. Here, we used our software, LiBELa [20], to dock ligands and decoys against 37 targets of the DUD dataset, 35 targets of the CM-DUD dataset and 12 targets of the DUDE dataset. The enrichments for each target of these benchmarks are summarized in Table 2, Table 3 and Table 4, respectively.

Table 2. Enrichment for DUD database showed as percentage of area under curve to linear and semi-logarithmic scales (AUC and logAUC).

Table 3. Enrichment for CM-DUD database showed as percentage of area under curve to linear and semi-logarithmic scales (AUC and logAUC).

Values of α are given in units of kcal.mol-1.e-2.

Table 4. Enrichment for DUDE database showed as percentage of area under curve to linear and semi-logarithmic scales (AUC and logAUC).

First of all, the well-established and validated tool DOCK6 was used as a positive control in the enrichment calculations. The DUD data set was submitted for docking calculations using automatically prepared receptor files and a pure force field scoring function (grid score). The entire calculation took about 336 minutes per target, averaging over the entire dataset, and the enrichment was then calculated using bash and python scripts. As shown in Table 2, an average logAUC of 17.4 was achieved with an average AUC of 57.3% (median 18.5 and 61.3%, respectively), which is in agreement with the enrichments reported by the DOCK6 developers [24]. For four targets, ADA, COMT, GPB and TK, DOCK6 achieves notable enrichments, especially considering the first decade of the semi-log graph, reaching logAUC values of 30.5, 57.7, 54.2 and 44.1 respectively (Fig 2). Interestingly, TK was previously shown to be a difficult target to obtain a good enrichment against its own decoys [25] when tested using DOCK 3.5.54, indicating that the interaction model used by DOCK6 is indeed accurate.

Fig 2. ROC curves for representative DUD targets.

The brown line shows the enrichment expected to for a random distribution of ligands and decoys. The enrichments obtained with FF (black line), SV (α = 0.1 kcal.mol-1.e-2, red), SV (α = 0.3 kcal.mol-1.e-2, blue) and DOCK6 (yellow) are shown. The complete set of ROC curves for DUD targets is available in S1 File.

The same ligands and decoys of the DUD dataset were also used for docking simulations in LiBELa, using a pure AMBER force field scoring function, similar to the grid score implemented in DOCK6 [20]. The docking calculations took about 175 minutes per target set on average and resulted in an average logAUC of 18.2 with an average AUC of 69.8% (median 19.1 and 73.3%, respectively). The results obtained with the hybrid approach as implemented in LiBELa were shown to be efficient using a pure force field scoring function and are in agreement with the results previously reported for this interaction model [20].

Considering that the solvent has important effects in ligand binding, one would expect that an appropriate treatment of these effects would lead to even better results in terms of enrichment of actual ligands against decoys. We then used the SV solvation model in docking simulations with the DUD dataset. For these calculations, the parameter α was varied between 0.05 and 0.4 kcal.mol-1.e-2 and the parameter β was kept fixed at -0.005 kcal.mol-1. Surprisingly, a worse average enrichment was achieved using the SV model when compared to a pure force field scoring function, as shown in Table 2, Fig 2 and S1 File. Averaging over 37 DUD targets, the FF+SV model achieved a logAUC of 15.9 with an AUC of 66.9% (median 11.5 and 67.4%, respectively), when the parameter α was set to 0.1 kcal.mol-1.e-2, i.e., the average enrichment was 13% inferior to the results achieved using a pure FF model, while the median decreased by 40%. An increase in the value of α resulted in a decrease in the average enrichment observed for this dataset, with an average logAUC of 14.1 (median in 13.8) and average AUC of 63.5 (median 62.2) for α = 0.3 kcal.mol-1.e-2 (Table 2). Different configurations of the solvation model with values of α ranging from 0.05 to 0.4 kcal.mol-1.e-2 were also tested resulting in decreased enrichments in all of these scenarios (S4 File).

As compared to the enrichment obtained in the pure FF model, considerable differences were observed for COX2, AMPC, TK, TRYPSIN and CDK2, all targets with highly polar pockets. In these cases, the pure FF model resulted in better enrichments than the FF+SV interaction model. Curiously, COX1 was one example where the introduction of solvation effect in the energy model improved the enrichment. Similar to COX2, COX1 has a partially buried polar (charged) binding site that recognizes small and charged molecules. The difference in the results observed for COX1 and COX2 may suggest that minor issues related to the ligands/decoys choice, for example, could play a role here.

The original DUD dataset was further shown to be imprecise in the distribution of charged molecules among ligands and decoys [11], with an increased fraction of charged ligands as compared to decoys. This inaccuracy could make ligands artificially more attractive to polar binding sites. The developers then released a charged-matched DUD (CM-DUD), where this inaccuracy was fixed. We also used the CM-DUD for docking simulations to assess whether the polar solvation model was artificially penalizing the actual ligands more than the decoys due their increased tendency to be charged. The enrichments observed for individual targets are shown in Table 3, Fig 3 and S2 File.

Fig 3. ROC curves for representative CM-DUD targets.

The brown line shows the enrichment expected to for a random distribution of ligands and decoys. The enrichments obtained with FF (black line), SV (α = 0.1 kcal.mol-1.e-2, red) and SV(α = 0.3 kcal.mol-1.e-2, blue) are shown. The complete set of ROC curves for CM-DUD targets is S2 File.

For the CM-DUD, an average logAUC of 6.4 with an AUC of 58.4% was observed for the pure FF energy model implemented in LiBELa, much smaller than the average logAUC observed for the original DUD but still comparable with the logAUC observed for DOCK6 grid score (2.7 with an average AUC of 52.0%), as shown in Table 3 and Fig 3. The introduction of the solvation model into the scoring function resulted again in a decrease in the average logAUC to 5.8, for α = 0.1 kcal.mol-1.e-2 and 4.7 for α = 0.3 kcal.mol-1.e-2.

These results underline important issues in terms of scoring functions for docking. First, the volume-based SV-model does not result in significant improvement in the enrichments of most targets when compared to a pure force field scoring function, despite its good correlation with experimental solvation energies. In original DUD or CM-DUD, a decrease of around 20–25% in the average enrichment is observed upon the introduction of the solvation effect as modeled by the SV model with α set to 0.3 kcal.mol-1.e-2. Second, the Coulomb model for electrostatic interactions is probably a weak model for quantifying macromolecular interactions. We come to this conclusion by comparing the performance of LiBELa and DOCK6 with the results previously reported for DOCK3.7, where a high enrichment was achieved for the CM-DUD dataset. That tool uses DELPHI [26], which relies on the numeric solution of the Poisson-Boltzmann for the evaluation of polar interactions [11]. Considering that the charge-pairing of ligands and decoys are the most important change between DUD and CM-DUD, it is plausible to think that the stronger electrostatic model was an important reason for the increased enrichment observed for DOCK3.7.

Finally, we decided to assess whether sampling effects could be playing a role in the average results observed for DUD and CM-DUD. Despite the good decoy-to-ligand ration (around 33 decoys per ligand), these data sets have just a few ligands in some cases. We then decided to use twelve targets of the DUD-Enhanced (DUDE) dataset, created from an increased number of actives molecules and keeping the decoy-to-ligand ratio around 33. For DUDE, interesting results were observed: for the pure FF model, an average logAUC of 7.3 was observed with and AUC of 60.4% (median 5.9 and 61.1%), as shown in Table 4, Fig 4 and S3 File. The introduction of the SV correction term to account for solvent effects resulted in a slight increase in the enrichment, resulting in an average logAUC of 8.3 and an AUC of 62.8% when α was set to 0.1 kcal.mol-1.e-2 (Fig 4). Setting the parameter α above this threshold resulted in a systematic decrease in the average logAUC, revealing an optimal value between 0.1 and 0.2 kcal.mol-1.e-2. Two interesting features can readily be observed in this result. First, there is an overall improvement of enrichment, i.e., there are small improvements for 10 out of the 13 target targets when α was set to 0.1 kcal.mol-1.e-2. This can be observed by the increase in the median logAUC together with the increase of the average logAUC. This observation suggests that the SV model result in consistent improvement in this larger dataset. Second, the targets with the higher improvement, PUR2, KITH, PPARD and GRIK1 have in common buried or partially buried active sites and polar interactions with the ligand though charged residues. For active sites with these features, a correction for the solvent effects is expected to play a significant role, resulting in improvement of the enrichment.

Fig 4. ROC curves for representative DUDE targets.

The brown line shows the enrichment expected to for a random distribution of ligands and decoys. The enrichments obtained with FF (black line), SV (α = 0.1 kcal.mol-1.e-2, red), SV(α = 0.3 kcal.mol-1.e-2, blue) and DOCK6 (yellow) are shown. The complete set of ROC curves for DUDE targets is available in S3 File.

Are the errors introduced in the scoring function due to sampling issues or confined to the scoring of molecular interactions? This is not a trivial question to address. In order to tackle this issue, we run docking simulations (self-docking) for 1,031 ligands of the SB2012 dataset using three different energy models, namely a pure FF model and FF+SV with α parameter set to 0.1 and 0.3 kcal.mol-1.e-2. The results are summarized in the Table 5. These results show that the introduction of a correction term to take into account the solvent effects does not result in significant increase of the root mean square deviation or of the rate of success at lower values of the parameter α, i.e., up to 0.2 kcal.mol-1.e-2. Increasing this parameter gradually increases the deviation, although keeping very low RMSD values as compared to DOCK6, as one would expect for a hybrid docking such as the used in LiBELa. Taken together the results shown in Table 5 indicate that sampling issues are not expected to play a relevant role in the decreased enrichment observed.

Table 5. Statistics of pose reproduction of the SB2012 dataset using DOCK6 and LiBELA using a pure FF energy model and FF+SV energy model.


Here we evaluated a variation of the widespread Stouten solvation model in the context of ligand docking. Despite the good correlation of the solvation energies computed by the SV model and experimentally determined solvation energies, the SV model resulted in decreased enrichments when used in retrospective docking simulation of gold standard benchmarks such as DUD and CM-DUD. This apparent paradox raises important questions. First, is the model inconsistent? And is the solvation treatment useful in the context of molecular docking?

The first question is somewhat tricky. How can a solvation model be correlated to experimental data and still lead to worse results? Looking at other empirical solvation models already proposed, we found similar results. For example, Mysinger and Shoichet observed that 20 out of the 40 DUD targets had better enrichments when no desolvation penalties were computed than when their solvent-excluded volume model (SEV) model was applied [11]. The average logAUC for no desolvation and for the SEV model were 14.3 and 15.0 with average AUC values of 68.8 and 68.7%, respectively, indicating again that the introduction of an empirical solvation function into their docking scoring function resulted in minimal improvement of the enrichment results, if any. Worth of note, TK and TRYPSIN were two targets with better enrichments using no desolvation, similarly to what was observed in this work with the SV model. The results shown for our SV model and for SEV suggest that empirical models for solvation treatment are still in their infancy and may represent an open road for future developments, necessary for the achievement of a more precise model for molecular interactions in the context of ligand docking.

Additionally, Coleman and coworkers revised the results obtained in the SAMPL4 challenge for solvation energies computed for 47 compounds using AMSOL, according to the default ZINC processing pipeline [27] and used by DOCK3.7. They found that the solvation energies as computed with this protocol were the worst predictions submitted to the challenge [28]. This conclusion highlights the current need for better fast empirical models applicable in the docking context and also provides an explanation for the lack of actual improvement upon the introduction of the solvation penalty in retrospective tests with DOCK. It should be mentioned that, in this context, our SV model showed a good correlation with experimental data, even when simple and empirical atomic charges were used for parametrization of the molecules.

One important source of errors in our model may be related to the evaluation of the electrostatic term in the binding energy. As already pointed out, the effect of long range interactions is best taken into account by continuum electrostatic models, where the dielectric constant and ionic strength can very through space [29]. The approach used to evaluate the polar interactions used in LiBELa is certainly limited by assuming a homogeneous and isotropic media. This is in part compensated by assuming a distance-dependent dielectric ‘constant’, i.e., ε = rij, but certainly results in accumulation of errors. The same analysis holds true for the solvation energies, which are primarily based on a penalty for polar interactions with a short- to medium range limit given by the Gaussian envelope function used. The consequences of these limitations can be observed in the performance of the SV model when compared to continuum methods such as GB and PB for a set of complexes of the SB2012 dataset. Continuum models such as GB and PB are very effective in reproducing the overall profile of solvation energies. Calculations done to the molecules of the FreeSolv database show a linear correlation of experimental solvation with energies computed with these continuum models (Fig 5A). The same comparison between experimental solvation energies and SV computed energies (Figs 1 and 5B) still shows that SV can predict the overall solvation profile for the organic molecules in the database but with lower correlation than continuum models, as expected. We also evaluated the calculations in a scenario of desolvation due to binding with the complexes of SB2012 dataset, where experimental desolvation energies are no available, but the empirical SV model can be compared to continuum models. When 999 molecules are compared for their solvation energies (Fig 5C), some correlation can be observed, although for some molecules, a random dispersion is observed. Interestingly, when the dataset is filtered to preserve neutral molecules only (Fig 5D, N = 506) the correlation between solvation energies computed with PB and SV increases substantially, revealing the weakness of the model to handle highly charged molecules. We therefore foresee opened opportunities for the improvement of solvation models by the adoption of more robust overall treatment of the electrostatics for molecular interactions.

Fig 5. Correlation between experimental solvation data and solvation energies from multiple models.

(A) Correlation between experimental solvation data for the FreeSolv dataset (N = 643) and solvation energies computed with GB (black squares) and PB (red squares). The Pearson correlation coefficient is r = 0.86, Spearman, s = 0.87 and Kendall coefficient is k = 0.70 for GB and r = 0.88, s = 0.98 and k = 0.72 for PB. (B) Correlation between experimental solvation energies for the FreeSolv dataset and SV computed solvation energies. r = 0.67, s = 0.75 and k = 0.57. (C) Correlation between desolvation energies for 999 receptor-ligand complexes of the SB2012 dataset computed with PB (horizontal axis) and SV (vertical axis). r = 0.23, s = 0.40 and k = 0.29. (D) Correlation between desolvation energies for complexes of the SB2012 with neutral ligands (N = 506). r = 0.21, s = 0.57 and k = 0.43.

Many results have been reported in the literature comparing the efficiency of a simpler interaction model with more robust models such as MM-GBSA or MM-PBSA. For example, Zhang and coworkers recently showed a massive parallel pipeline for virtual screening based on ligand docking with Vina and pose rescoring using the MM-GBSA as implemented in Amber [30]. Using the higher theory level for rescoring, a better enrichment of the DUDE benchmark was reached when compared to the results of docking alone. The authors reported an average AUC of 66.4% (median 68.3%) when using the Vina energy model and 71.1% (median 70.3%) after the rescoring. This 7% in increase of the average area under the curve of ROC plots was obtained at the price of 5 hours of massive parallel computation on 15,000 CPUs, spent only on rescoring. These results suggest that the improvement of the current interaction models used for ligand docking are not just necessary but may also be sufficient to make pointless the rescoring of docking poses using end-point methods.

Graves and coworkers also evaluated the MM-GBSA model for rescoring docking poses generated by DOCK3.5 [8]. From 33 tested ligands, MM-GBSA recovered 23 actual binders that were considered false negatives in docking. On the other hand, 10 true negatives according to the docking model were introduced as false positives after rescoring. This example shows that even a more robust solvation model such as GB can result in important errors during the ranking of a ligand-receptor complex, suggesting the modeling of such interactions can be challenging even for the higher theory levels.

The results briefly reviewed above reveal that even when corrected for solvation energies using robust solvation models, the inaccuracies in the interaction model seem to equate the inaccuracies in the solvation model resulting in results that are marginally better but very expensive in terms of computational time.

The Stouten model could be a choice for a fast solvation model used in molecular interaction modeling. When originally proposed, this fast method was shown effective to reproduce correct conformations of BPTI, comparable with conformations observed in explicit solvent simulations [12]. Variations of this model are already used for molecular docking in the program AutoDock, for example [10]. Evaluating the performance of the variation of the Stouten model proposed by Verkhivker and coworkers, we found that the introduction of the solvent effect into the scoring function does not improve the enrichment of known ligands against decoys on DUD or CM-DUD and results in modest improvement in DUDE targets.

The question still remains: is the solvation correction useful in the context of molecular docking? It definitely is! We tested the performance of our SV model in a scenario closer to a real application, in a virtual screening campaign using the nuclear receptor PPARγ as the target. This receptor has a buried and bulky active site with three ‘arms’ [31]. One of these arms, is very polar and the other two arms are mainly hydrophobic. The natural ligands of this receptor are fatty acids and several synthetic ligands have already been described in the literature. Worth of note, the thiazolidinediones (TZDs), such as pioglitazone and rosiglitazone, are high affinity ligands of the receptor PPARγ. After screening the dataset ChEMBL Drugstore [32], as available in ZINC [27], we analyzed the net charge of the top one-hundred molecules, ranked by binding energy using a pure FF model and also using FF+SV interaction model. As shown in Fig 6, the introduction of the solvation correction term, reduced the excess electrostatic energy term, reducing the fraction of molecules with charge -2 and -3 among the top scored molecules. Since the receptor is expected to typically bind neutral or -1 charged molecules, the solvent correction is shown to be useful to penalize the polar interactions and avoid the overvaluation of the electrostatic term in the binding energy.

Fig 6. Distribution of net charge for the top scored molecules in VS.

Distribution of net charge for the top scored molecules found in a virtual screening campaign using PPARγ as the target. For the SV model, the parameter α was set to 0.2 kcal.mol-1.e-2.

In conclusion, we found that after the adjustment of the parameter alpha, the SV solvation model shows a good correlation with experimental solvation energies. However, the introduction of this model into a docking scoring function does not improve the enrichment of actual binders against decoys. In contrary, the enrichment was reduced about 20%. These results, in line with recent findings for empirical solvation functions, highlight the need for better interaction models that could be useful in the context of molecular docking. In addition, the introduction of the correction term clearly showed an ability to correctly penalize molecules with net charges -2 and -3 e making them less attractive in the context of virtual screening [33].


The Stouten-Verkhivker solvation model

The original Stouten model for solvation free energy is based on the atomic exposal to solvent, i.e., the occupancy of protein atoms around a ligand atom, with a Gaussian envelope function. This occupancy term is then multiplied by an atomic solvation term that depends on the atom type [12]: (5)

Verkhivker and coworkers gave a new view of the intrinsic solvation parameter by proposing that, phenomenologically, each atom has an ‘affinity’ by the solvent that can be given by a function of the square of the atomic charge [19]. Although this proposal was used in a different context, with a softcore potential, a final empirical solvation model can be defined as: (6) where f is the solvent volume displaced by an atom and S is a function of the square of the atomic charge, . This model, hereinafter Stouten-Verkhivker, or SV model, was evaluated in this work with α=0.2 kcal.mol-1.e-2, β=-0.005 kcal.mol-1 and σ=3.5 Å, unless otherwise stated.

Assessment of the SV model as an additional term in docking scoring function

The test set SB2012 [24] was used to assess the performance of the docking scoring function. This dataset is composed by 1,043 receptor-ligand complexes provided with crystallographic ligands already placed in their receptors binding site, so that the binding energy can be calculated taking the conformational search out of equation. Receptor and ligands were used as provided, i.e., without any additional preparation. For this evaluation, the program DOCK6.7 [24] was employed for the sake of comparison, using the Grid Score. The same receptor-ligand complexes were scored in our algorithm, LiBELa [20], where a pure Amber Force Field energy evaluation is combined with the SV solvation model as described above: (7)

The energy evaluation in pre-computed grids was also employed in LiBELa for both energy models. From the 1,043 receptor-ligand complexes, 1,031 were actually docked using LiBELa and 953 were docked using DOCK6.

Assessment of the SV model in enrichment tests

The SV solvation model was also evaluated in a typical virtual screening scenario, where a large number of compounds are computationally scrutinized looking for actual binders. This assessment consisted of docking simulations of known ligands and decoys against 37 targets of the directory of useful decoys (DUD) [25]. The Adjusted logAUC (logAUC) was used as a metric for enrichment together with the area under the curve (AUC). Here, the term enrichment describes the ability of the algorithm to populate the top of list of docked molecules sorted by docking energies with known binders against decoys. The Adjusted logAUC measures the area under the curve of a semi-log representation of the false positive against true positive rates and subtracts the area expected for a random disposition of ligands and decoys in the ranked results. By using this log scale, the logAUC gives the same weight to the enrichment in the very beginning of the dataset (0.1% to 1% of decoys) to the mid-early enrichment (1–10% of decoys) and to the late enrichment (10–100% of decoys) [11]. Since the distribution of charged and neutral molecules among the decoys in the original DUD dataset was found to be different from the distribution in the ligands, a new version of the DUD was proposed by its developers, the charge-matched DUD or CM-DUD [11]. Soon after, the same developers also proposed the DUD-enhanced, or DUDE [34]. The CM-DUD and a subset of DUDE were also used for the assessment of the SV solvation model in retrospective studies.

The protocol for receptor and ligand preparation was the same for these three benchmarks. The receptors were prepared with UCSF Chimera [35], where missing hydrogen atoms were added, and atom types and charges were attributed according to Amber FF9SB force field [36]. Ligands and decoys were used as provided in SYBYL mol2 files, according to the default ligand preparation of the database [27,37]. In cases where molecules were not distributed in SYBYL mol2 files (e.g., CM-DUD), the 3D structures were generated by BABEL [23] using MMFF94 charge model. When applicable, the program pymdpbsa, part of the AMBER package, was used for estimating solvation energies using GB and/or PB models.


Docking of ligands and decoys were performed against their own target using LiBELa [20]. LiBELa (Ligand Binding Energy Landscape) is a hybrid algorithm based on the superposition of a search ligand on a reference ligand already placed in the receptor binding site. This superposition is achieved by matching the molecular volume and charge distribution as the target function using the MOLSHACS algorithm [38]. Afterwards, the pose of this initially placed ligand is optimized once again in the Cartesian space (as a rigid body) using the binding energy as the objective function. The ligand flexibility is treated by an on-the-fly generation of an ensemble of conformers through a stochastic search of rotatable bonds with the genetic algorithm implemented in the OpenBabel API [23]. Each conformer is overlaid on the reference ligand and the best conformer is then optimized inside the active site as a rigid body in order to generate a final and low-energy binding pose. The augmented Lagrangian [3941] algorithm was used for optimization of the molecular overlay followed by the optimization of the binding pose using the dividing rectangles algorithm [42]. Both methods were used as implemented in the NLOPT library [43] with a 10-6 relative tolerance and a timeout of 30 seconds. The ligands were allowed to translate 12 Å in each direction during the Cartesian search and to perform a full rotation around Euler’s angles. Twenty conformers were generated for each ligand and the best two conformers, as judged by the initial binding energy, were used in the second Cartesian search. The best scored conformer was used for ranking purposes. A cubic grid box with 30 x 30 x 30 Å was used for pre-computation of the potential energies and docking simulations [13]. The grid points were equally spaced by 0.3 Å in all directions. For electrostatic potential energies, the dielectric ‘constant’ was set to the interatomic distance, as previously proposed [29].

Docking simulations using the same benchmarks were also performed with DOCK6.7 using Grid Score with parameters similar to those used in LiBELa, whenever possible. The cluster of spheres generated by SPHGEN within a 10 Å radius from the crystallographic (reference) ligand were selected as shape descriptors. Interaction grids were then computed in a box encompassing the selected spheres with a 5 Å buffer. The grid was computed using a 0.3 Å spacing using Amber FF99 atomic parameters. For docking, a maximal number of 500 orientations was used allowing a chemical matching between receptor and ligand atoms. Ligand flexibility was treated with the anchor-and-grow method with a further minimization of poses in a 100 simplex cycles.

Supporting information

S1 File. ROC curves for all DUD targets.

The brown line shows the enrichment expected to for a random distribution of ligands and decoys. The enrichments obtained with FF (black line), SV (α = 0.1 kcal.mol-1.e-2, red), SV (α = 0.3 kcal.mol-1.e-2, blue) and DOCK6 (yellow) are shown.


S2 File. ROC curves for all CM-DUD targets.

The brown line shows the enrichment expected to for a random distribution of ligands and decoys. The enrichments obtained with FF (black line), SV (α = 0.1 kcal.mol-1.e-2, red), SV (α = 0.3 kcal.mol-1.e-2, blue) and DOCK6 (yellow) are shown.


S3 File. ROC curves for all DUDE targets.

The brown line shows the enrichment expected to for a random distribution of ligands and decoys. The enrichments obtained with FF (black line), SV (α = 0.1 kcal.mol-1.e-2, red), SV(α = 0.3 kcal.mol-1.e-2, blue) and DOCK6 (yellow) are shown.


S4 File. Enrichment data for DUD targets for different sets of α values in SV model.



The authors thank to FAPESP and CNPq for financial support through grants 2010/15376-8 and 2014/06565-2(FAPESP) and 485950/2013-8 (CNPq). HSM thanks CNPq for her PhD stipend.

Author Contributions

  1. Conceptualization: ASN.
  2. Data curation: HSM ASN.
  3. Formal analysis: HSM.
  4. Funding acquisition: ASN.
  5. Investigation: HSM.
  6. Methodology: HSM ASN.
  7. Project administration: ASN.
  8. Resources: ASN.
  9. Software: HSM ASN.
  10. Supervision: ASN.
  11. Validation: HSM.
  12. Visualization: HSM ASN.
  13. Writing – original draft: HSM ASN.
  14. Writing – review & editing: HSM ASN.


  1. 1. Gao M, Skolnick J. A Comprehensive Survey of Small-Molecule Binding Pockets in Proteins. Plos Comput Biol. 2013;9.
  2. 2. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, et al. The Protein Data Bank. Acta Crystallogr Sect D-Biological Crystallogr. 2002;58: 899–907.
  3. 3. Anand P, Nagarajan D, Mukherjee S, Chandra N. PLIC: protein-ligand interaction clusters. Database (Oxford). 2014
  4. 4. Schumann M, Armen RS. Identification of Distant Drug Off-Targets by Direct Superposition of Binding Pocket Surfaces. PLoS One. 2013;8.
  5. 5. Wereszczynski J, McCammon JA. Statistical mechanics and molecular dynamics in evaluating thermodynamic properties of biomolecular recognition. Q Rev Biophys. 2012;45: 1–25. pmid:22082669
  6. 6. Kollman P. Free-Energy Calculations—Applications to Chemical and Biochemical Phenomena. Chem Rev. 1993;93: 2395–2417.
  7. 7. Ponder JW, Case DA. Force Fields for Protein Simulations. Adv Protein Chem. 2003;66: 27–85. pmid:14631816
  8. 8. Graves AP, Shivakumar DM, Boyce SE, Jacobson MP, Case DA, Shoichet BK. Rescoring docking hit lists for model cavity sites: Predictions and experimental testing. J Mol Biol. 2008;377: 914–934. pmid:18280498
  9. 9. Chang CA, Chen W, Gilson MK. Ligand configurational entropy and protein binding. Proc Natl Acad Sci U S A. 2007;104: 1534–1539. pmid:17242351
  10. 10. Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK, et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem. 1998;19: 1639–1662.
  11. 11. Mysinger MM, Shoichet BK. Rapid Context-Dependent Ligand Desolvation in Molecular Docking. J Chem Inf Model. 2010;50: 1561–1573. pmid:20735049
  12. 12. Stouten PFW, Frommel C, Nakamura H, Sander C. An Effective Solvation Term Based on Atomic Occupancies For Use In Protein Simulations. Mol Simul. 1993;10: 97-.
  13. 13. Luty B a., Wasserman ZR, Stouten PFW, Hodge CN, Zacharias M, McCammon JA. A molecular mechanics/grid method for evaluation of ligand-receptor interactions. J Comput Chem. 1995;16: 454–464.
  14. 14. Meng EC, Shoichet BK, Kuntz ID. Automated Docking with Grid-Based Energy Evaluation. J Comput Chem. 1992;13: 505–524.
  15. 15. Mongan J, Simmerling C, McCammon JA, Case DA, Onufriev A. Generalized Born model with a simple, robust molecular volume correction. J Chem Theory Comput. 2007;3: 156–169. pmid:21072141
  16. 16. Still WC, Tempczyk A, Hawley RC, Hendrickson T. Semianalytical Treatment of Solvation For Molecular Mechanics and Dynamics. J Am Chem Soc. 1990;112: 6127–6129.
  17. 17. Bashford D, Case DA. Generalized Born Models of Macromolecular Solvation Effects. Annu Rev Phys Chem. 2000;51: 129–152. pmid:11031278
  18. 18. Shoichet BK. Virtual screening of chemical libraries. Nature. 2004;432: 862–865. pmid:15602552
  19. 19. Verkhivker GM, Rejto PA, Bouzida D, Arthurs S, Colson AB, Freer ST, et al. Towards understanding the mechanisms of molecular recognition by computer simulations of ligand-protein interactions. J Mol Recognit. 1999;12.
  20. 20. dos Santos Muniz H, Nascimento AS. Ligand- and receptor-based docking with LiBELa. J Comput Aided Mol Des. 2015;29: 713–723. pmid:26141308
  21. 21. Mobley DL. s, Guthrie JP. FreeSolv: A database of experimental and calculated hydration free energies, with input files. J Comput Aided Mol Des. 2014;28: 711–720. pmid:24928188
  22. 22. Wang J, Wang W, Kollman PA, Case DA. Automatic atom type and bond type perception in molecular mechanical calculations. J Mol Graph Model. 2006;25: 247–260. pmid:16458552
  23. 23. O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open Babel: An open chemical toolbox. J Cheminform. 2011;3: 33. pmid:21982300
  24. 24. Brozell SR, Mukherjee S, Balius TE, Roe DR, Case DA, Rizzo RC. Evaluation of DOCK 6 as a pose generation and database enrichment tool. J Comput Aided Mol Des. 2012;26: 749–773. pmid:22569593
  25. 25. Huang N, Shoichet BK, Irwin JJ. Benchmarking sets for molecular docking. J Med Chem. 2006;49: 6789–6801. pmid:17154509
  26. 26. Gilson MK, Sharp KA, Honig BH. Calculating the electrostatic potential of molecules in solution: Method and error assessment. J Comput Chem. 1988;9: 327–335.
  27. 27. Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG. ZINC: A Free Tool to Discover Chemistry for Biology. J Chem Inf Model. 2012;52: 1757–1768. pmid:22587354
  28. 28. Coleman RG, Sterling T, Weiss DR. SAMPL4 & DOCK3.7: lessons for automated docking procedures. J Comput Aided Mol Des. 2014;28: 201–209. pmid:24515818
  29. 29. Gilson MK, Honig BH. Energetics of Charge Charge Interactions In Proteins. Proteins-Structure Funct Genet. 1988;3: 32–52.
  30. 30. Zhang XH, Wong SE, Lightstone FC. Toward Fully Automated High Performance Computing Drug Discovery: A Massively Parallel Virtual Screening Pipeline for Docking and Molecular Mechanics/Generalized Born Surface Area Rescoring to Improve Enrichment. J Chem Inf Model. 2014;54: 324–337. pmid:24358939
  31. 31. Nascimento AS. Structural requirement for PPARgamma binding revealed by a meta analysis of holo-crystal structures. Biochimie. 2010;92: 499–506. pmid:20138109
  32. 32. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40: D1100–7. pmid:21948594
  33. 33. Shoichet BK, Leach AR, Kuntz ID. Ligand solvation in molecular docking. Proteins-Structure Funct Genet. 1999;34: 4–16.
  34. 34. Mysinger MM, Carchia M, Irwin JJ, Shoichet BK. Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking. J Med Chem 2012;55: 6582–6594. pmid:22716043
  35. 35. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et al. UCSF chimera—A visualization system for exploratory research and analysis. J Comput Chem. 2004;25: 1605–1612. pmid:15264254
  36. 36. Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C. Comparison of multiple amber force fields and development of improved protein backbone parameters. Proteins-Structure Funct Bioinforma. 2006;65: 712–725.
  37. 37. Irwin JJ, Shoichet BK. ZINC—A free database of commercially available compounds for virtual screening. J Chem Inf Model. 2005; 177–182. pmid:15667143
  38. 38. Vaz de Lima LAC, Nascimento AS. MolShaCS: A free and open source tool for ligand similarity identification based on Gaussian descriptors. Eur J Med Chem. 2013;59: 296–303. pmid:23247048
  39. 39. Andreani R, Birgin EG, Martinez JM, Schuverdt ML. On Augmented Lagrangian Methods with General Lower-Level Constraints. Siam J Optim. 2007;18: 1286–1309.
  40. 40. Andreani R, Birgin EG, Martinez JM, Schuverdt ML. Augmented Lagrangian methods under the constant positive linear dependence constraint qualification. Math Program. 2008;111: 5–32.
  41. 41. Birgin EG, Martinez JM. Improving ultimate convergence of an augmented Lagrangian method. Optim Methods Softw. 2008;23: 177–195.
  42. 42. Jones DR, Perttunen CD, Stuckman BE. Lipschitzian optimization without the Lipschitz constant. J Optim Theory Appl. 1993;79: 157–181.
  43. 43. Jhonson SG. The NLopt nonlinear-optimization package.