Effect of set up protocols on the accuracy of alchemical free energy calculation over a set of ACK1 inhibitors

Hit-to-lead virtual screening frequently relies on a cascade of computational methods that starts with rapid calculations applied to a large number of compounds and ends with more expensive computations restricted to a subset of compounds that passed initial filters. This work focuses on set up protocols for alchemical free energy (AFE) scoring in the context of a Docking–MM/PBSA–AFE cascade. A dataset of 15 congeneric inhibitors of the ACK1 protein was used to evaluate the performance of AFE set up protocols that varied in the steps taken to prepare input files (using previously docked and best scored poses, manual selection of poses, manual placement of binding site water molecules). The main finding is that use of knowledge derived from X-ray structures to model binding modes, together with the manual placement of a bridging water molecule, improves the R2 from 0.45 ± 0.06 to 0.76 ± 0.02 and decreases the mean unsigned error from 2.11 ± 0.08 to 1.24 ± 0.04 kcal mol-1. By contrast a brute force automated protocol that increased the sampling time ten-fold lead to little improvements in accuracy. Besides, it is shown that for the present dataset hysteresis can be used to flag poses that need further attention even without prior knowledge of experimental binding affinities.


Introduction
There is continuous interest in computational methods to decrease time and costs of hit-tolead and lead optimization efforts in preclinical drug discovery [1]. A recurring topic in computational chemistry is the use of virtual in silico screens to find ligands for proteins [2,3]. Typically, the goal is to filter via a cascade of computational methods a large library to focus experimental efforts on a small number of molecules. Usually inexpensive methodologies are applied first to eliminate a large number of poorly suited molecules, with more expensive calculations reserved to a subset of promising ligands. This approach may be applied in the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 context of hit discovery where the goal is to identify a few weak binders from a library of existing molecules; or for hit-to-lead efforts where the goal is to identify analogues of a hit structure that could be prioritized for synthesis and assays. In both cases the main steps frequently involve library screening, docking, initial scoring, and re-scoring with diverse molecular simulation methods such as Molecular Mechanics Poisson Boltzmann (Generalized Born) Surface Area (MM/PBSA) [4], Linear Interaction Energy (LIE) [5] or Free energy Perturbation (FEP) [6] methods [7].
In a previous study a multistep docking and scoring protocol was benchmarked in the context of re-scoring with the MM/PB(GB)SA method [8]. The set of ligands analysed in that study belonged to the same scaffold and it was assumed that the core binding mode of the conserved scaffold would not deviate from that of the experimentally X-ray resolved one. The present study investigates the suitability of alchemical free energy (AFE) methods for improving on this multistep docking and scoring protocol by means of a further re-scoring of ligands. AFE methods are increasingly used for predictions of free energies of binding in blinded competitions such as SAMPL (Statistic Assessment of Modelling of Proteins and Ligands) and D3R grand challenges [9][10][11][12][13][14][15]. Some AFE protocols have even achieved predictions of binding energies with root mean square deviations (RMSD) under 1.5 kcal mol −1 , and Pearson Correlation coefficients (R) of around 0.7 or better [16][17][18][19][20][21][22][23]. Nevertheless, the performance varies significantly between different AFE protocols and targets [24][25][26] and it is important to explore further the robustness of these methodologies.
Specifically, this study aimed to explore the extent to which a setup protocol motivated by previous domain knowledge may influence the accuracy of AFE calculations, and whether issues such as binding poses selection or binding site water placement can be overcome via an increase of the simulation time. This was investigated using a dataset of 15 congeneric inhibitors of the protein activated Cdc42-associated kinase (ACK1) [27], a potential cancer target [28,29]. The compounds span a large range of activity (K i values ranging from more than 10 μM to 0.0002 μM), as seen in Fig 1, and are typical of the structural modifications performed in hit-to-lead programs. The 15 ligands were first docked into the ACK1 ATP-binding site, and a set of docked poses obtained for each ligand was re-scored with a 4-step minimization protocol followed by a single-snapshot MM/PBSA re-scoring. The best scored pose was alchemically studied and the relative binding energy was compared to the experimental one. The alchemical calculations were also repeated with a 10-fold increase in sampling time. The role of a possible bridging water molecule in the binding pocket was also considered. Finally, thermodynamic cycle closures were analyzed as a way to detect incorrectly predicted poses without knowledge of the experimental relative binding energies.

Dataset
The dataset consists of 15 ACK1 competitive inhibitors for which inhibition constants (K i ) have been reported. The structure of only one protein-ligand complex (compound 35) was determined by X-ray crystallography [27] (Fig 2). This dataset was further divided into two subsets: batch 1 (6 ligands with K i values ranging from >10 μM to 0.006 μM), and batch 2 (9 ligands with K i values ranging from 0.013 to 0.0002 μM).

Protein and ligand setup
The ACK1 kinase domain structure was taken from the Protein Data Bank, code 4EWH [27], using chain B of the crystal structure, which was protonated with MOE v2009.1 [30]. The structure has no missing residues; Tyrosine 284 was dephosphorylated with MOE following Lougheed et al. observation that inhibitor binding is not expected to be sensitive to the phosphorylation state of this residue [31]. The protonation state of each ligand was predicted using the SDwash program in MOE v2009.1. Batch 1 ligands were predicted to be neutral, whereas batch 2 ligands were predicted to be positively charged.

Docking
Docking was performed with MOE v2009.1 [30]. The full docking process was done in three steps. The first one was an exhaustive conformational search of the ligands using the Systematic option of MOE together with the option Enforce chair conformations on. All other parameters were set to the standard options. A maximum of 100 conformations by compound were selected for the Placement step. In the second step the receptor was defined as those atoms within 9.0 Å from the ligand. The Rotate Bonds option was activated and the Affinity dG function employed together with the Triangle Matcher method for placement. A maximum of 30 poses for each ligand were retained. Finally, the 500 best structures were submitted to the Refinement step with the Force Field function and allowing the lateral chains of the pocket residues to move during the optimization without restriction. All other parameters were set to the standard options. The five best structures obtained for each ligand, according to their predicted binding energies, were retained for minimization and re-scoring with MM/PBSA.

MM/PBSA
A four-step minimization protocol followed by a single snapshot MM/PBSA re-scoring was performed with Amber 14 [32]. Ligands were prepared with Antechamber using the GAFF force field [33] with AM1-BCC partial charges [34,35], while the ff99SB [36] force field was used for the protein. All systems were solvated in a rectangular box of TIP3P water molecules [37]. Counterions were added as necessary to neutralize the systems [38]. Energy minimization was performed under periodic boundary conditions using the particle-mesh-Ewald method for the treatment of the long-range electrostatic interactions [39]. A cut-off distance of 10 Å was chosen to compute non-bonded interactions. The four-step minimization procedure was as follows: 1) 5000 steepest descent (SD) steps applied to water molecule coordinates only; 2) 5000 SD steps applied also to protein atoms, with positional harmonic restraints (5 kcal mol -1 Å -2 ) applied to backbone atoms only; 3) 5000 SD steps as done previously with backbone atom restraints set to 1 kcal mol -1 Å -2 and 4) 5000 SD steps with no restraints.
For each of the energy minimized structures, a binding free energy was estimated following the MM/PBSA method using the MM/PBSA.py program [40]. No entropic contributions were taken into account, while the variables cavity_surften and cavity_offset were assigned the values of 0.00542 kcal mol -2 Å -2 and -1.008, respectively, using the defaults for all remaining variables.

Alchemical free energy calculations
Relative binding free energies were calculated using a single topology molecular dynamics alchemical free energy approach [41]. Alchemical free energy calculations avoid direct computation of the free energy change associated with the reversible binding of a ligand to a protein through an artificial morphing of a ligand X into another ligand Y by using a parameter λ which defines the change from X to Y. Thus, the relative free energy of binding (ΔΔG X!Y ) was given by Eq 1 as: Where DG free X!Y is the free energy change for transforming ligand X into ligand Y in solution whereas DG complex X!Y is the free energy change for the same transformation in the protein binding site. A relative free energy perturbation network for both batch 1 and batch 2 was designed (S5 Fig and S14 Fig). The top-scored MM/PBSA pose for each ACK1 ligand was used as input for the subsequent alchemical free energy preparation protocol using the FESetup software package [42]. The protocol used by FESetup for the automated preparation of ligands, protein and complexes was as follows: Ligands. Atomic charges were assigned by using the Antechamber module in Amber-Tools 14 [32], selecting the AM1-BCC method [34,35], and the GAFF2 force field [33]. Ligands were solvated with TIP3P water molecules [37], with counterions added as necessary to neutralize the system [38]. Each system was energy minimized for 100 SD cycles and equilibrated at 300 K and 1 atm pressure for 10 5 molecular dynamics (MD) steps with a 2 fs timestep using the module Sander [32], with a positional harmonic restraint (10 kcal mol -1 Å -2 ) applied to ligand atoms. Bonds involving hydrogen atoms were constrained.
Protein. The protein was parametrized using the Amber ff14SB force field [43]. Complexes. Each ligand was combined back with the ACK1 protein model and the complex was solvated with TIP3P water molecules [37]. . Counterions were also added to neutralize the solution [38]. The system was afterwards equilibrated following the procedure already described for ligands, using now 5000 MD steps.
All alchemical free energy calculations used 11 equidistant λ windows. For each λ value MD trajectories were computed in the NPT ensemble with a pressure of 1 atm and temperature of 300 K using the software SOMD 2016.1.0 [25,44]. SOMD has been used in several recent studies to model the binding energetics of enzyme inhibitors [26], carbohydrate ligands [24], and host-guest systems [13]. Each λ window was sampled for 4 ns. Pressure was regulated using a Monte Carlo barostat [45,46] with an update frequency of 25 MD steps. Temperature was kept constant using the Andersen thermostat [47], with a collision frequency of 10 ps -1 . A 2 fs time step was used with the leapfrog-Verlet integrator. All bonds involving hydrogens were constrained to their equilibrium distances. Non-bonded interactions were evaluated setting a cut-off distance of 12 Å. Long-range electrostatic interactions were calculated using the shifted atom-based Barker-Watts reaction field [48], with the medium dielectric constant set to 82.0. In order to avoid steric clashes at the beginning of each MD run due to modifications of the ligand parameters associated with changes in λ, each structure was energy minimized for 1000 steps prior to MD simulation.
Each simulation was repeated at least twice using different initial assignments of velocities, and both ΔΔG X!Y and ΔΔG Y!X were calculated from independent simulations. In some cases, when poor agreement was observed between duplicates a third run was performed.
Ligand 38 was tested as a racemic mixture for consistency with the experimental conditions. Calculations were carried out for each enantiomer and the binding energies relative to this ligand were given with Eq 2: Cycle closures were evaluated using free energy changes from both the forward (X ! Y) and reverse (Y ! X) perturbations. The metrics used to evaluate the datasets were the determination coefficient R 2 , linear regression slope and the mean unsigned error (MUE). Experimental binding affinities were calculated from the corresponding inhibition constants [27] As no uncertainties have been reported for the K i values, an uncertainty of 0.4 kcal mol -1 was assumed [20,49].
Relative free energies were estimated using the multistate Bennett's acceptance ratio (MBAR) method [50], as included in the software analyse_freenrg from the Sire software suite. Relative free energies for complete datasets were evaluated using version 0.3.5 of the freenrgworkflows python module [https://github.com/michellab/freenrgworkflows], using ligand 3 as a reference, for which a K i = 10 μM is used.
For more details, see Mey et al. [51]. All analysis scripts are available online at https:// github.com/michellab/ACK1_Data. Alchemical free energy protocols. Five different alchemical free energy protocols were followed. Protocol A uses for each ligand the best scored pose according to MM/PBSA. This leads to a pose that differs from the X-ray crystallographic pose of 35 for several ligands (2, 4, 7, 8, 16, 44 and 45). Protocol B needs user intervention to select the pose most similar to the experimental binding mode. Protocols C and D explore the effect of manually modelling a water molecule inside the ACK1 ATP-binding site (see Fig 3). This reflects user knowledge that in other high-resolution structures of ACK1 (e.g. the 1.31 Å resolution 4HZR structure [52]) one additional binding site water molecule between the protein and ligand is apparent. Protocol C uses the same ligand poses as Protocol A, while Protocol D uses the same poses as Protocol B.
Finally, Protocol E is simply Protocol A with the per λ simulation time increased ten-fold. This was done to evaluate whether the different binding mode and ATP-binding site water rearrangements seen in Protocols A-D can be sampled with longer MD simulation protocols. Protocol E is computationally expensive and was applied to batch 1 only (ca. 10 μs of simulation time).
The stability of the ligand poses and protein structure for all protocols was assessed by plotting the distribution of RMSD values across the dataset for the ligand and the protein backbone atoms (S1 and S2 Figs). The results indicate that the poses are generally stable (RMSD < 2 Å for almost all poses), the protein structure is generally stable (mean RMSD ca.

Batch 1
Protocol A renders (Table 1) Table 1, and S3, S6 and S9 Figs. This protocol gives  Effect of protocols on alchemical free energy calculations clearly better results, although the underestimation (slope 0.4) of relative binding free energies remains high, and ligands 2, 4 and 7 are still ranked poorly. An analysis of the relative binding energies calculated with protocols A and B (S3, S5 and S6 Figs), for ligands 2, 4 and 7, reveals that these ligands appear in the perturbations that show the highest deviations between the experimental and calculated relative binding energies. Thus, for protocol A deviations of more than 3.0 kcal mol -1 are observed for 2 ! 3, 7 ! 4, 7 ! 6, 7 ! 3 and 3 ! 7, while for protocol B these deviations appear for perturbations 2 ! 3, 4 ! 7, 7 ! 4, 7 ! 3 and 3 ! 7. An analysis of the docked structures of ligands 6 and 7 suggested that a possible explanation for the inability of the protocol to reproduce the experimental relative binding affinities is due to interactions of the extra nitrogen atom in the pyrimidine ring of ligand 7 that is missing in the pyridine ring of ligand 6 (see Fig 5). The extra N atom in the pyrimidine ring could establish a hydrogen bond with THR205 (see Fig 3) if a bridging water was present. Indeed, several water molecules are present inside the ATP-binding pocket of 4HZR [52]. That possibility was explored in protocols C and D, where a water molecule was manually placed inside the binding pocket between the nitrogen in position D of ligand 7 (see Fig 1) and THR205. The final position of the water molecule is obtained after 100 steps of SD minimization fixing all other atoms. Results for protocol C are shown in Table 1 and S10 Fig, while those for protocol D appear in Table 1 and Figs 4B and 5. Protocol D clearly surpass all others, with a R 2 of 0.84±0.03 and an improvement in the underestimation of relative binding energies (slope = 0.5). A comparison of the calculated relative binding energies for ligands 3 and 4 allows to conclude that using a different pose for ligand 4 does not seem to affect the results (both protocols A and B for example, give an average ΔΔG 3!4 of 1.3 kcal mol -1 ). Inspection of the calculated trajectories show that ligand 4 rapidly converts from its initial docked pose (protocols A and C) to one similar to that used as input for protocols B and D. The computed trajectories were visualized to assess the stability of the active site water molecule. The water molecule showed little drift from its initial position in most cases, with the exception of perturbations involving compound 6 in protocol C, where the water frequently diffused away from the binding site.
The possibility of resolving ambiguities in binding poses and binding site water content without user intervention was next tested by increasing the simulation sampling time to 40 ns for each λ window. The expectation was this would allow the ligand to find the correct pose and to allow water molecules diffuse in the ATP-binding site (see Fig 3). Results are shown in Table 1 and S3 and S11 Figs. The increased simulation time does not translate into any improvement of the results. The R 2 , slope and MUE values are as poor or poorer as those for protocol A, while the outliers remain the same. The MD trajectories show that, even with the increased simulation time, ligand 7 is not able to change its docking pose, while ligand 4 needs under 4 ns to adopt a pose that resembles the X-ray pose of 35. Besides, a water molecule enters and remains in the ATP-binding site in 7 out of 22 MD trajectories only.

Analysis of the complete dataset
The robustness of the results obtained for batch 1 was tested by processing batch 2 and re-analyzing the full dataset. Ligands in batch 2 are positively charged in the assay conditions, whereas batch 1 ligands are neutral. Relative free energy calculations that involve a net charge change are still technically challenging for simulations carried out with a reaction-field cut-off. Effect of protocols on alchemical free energy calculations Thus, the perturbations between ligands 8 and 15 were carried out assuming 15 is neutral. This is of course a significant simplification. Results for individual perturbations in batch 2 are shown in S4 and S14 to S18 Figs.
Protocol A, as expected given the results obtained for batch 1, gives modest results, as can be seen in Table 2    Interestingly, the dihedral angle defining the relative orientation of the NH group that links the pyrimidine and the cyclopentanol rings changes values rather quickly during the simulation. Fig 7A shows an example for the first repeat of the perturbation 44!45 at λ = 0. For the simulations involving ligand 44 an intramolecular H-bond between its aniline NH group and its cyclopentyl hydroxyl group is established (see Fig 7C). That conformation is precisely the second-best MM/PBSA docked one (see Fig 2C), which features that intramolecular hydrogen bond. Thus, batch 2 protocol B includes the second-best scored MM/PBSA poses for ligands 8, 16 and 44. In the case of ligands 8 and 16, this implies using a pose that resembles the most the X-ray binding mode, while for ligand 44 the second-best scored MM/PBSA pose differs from the best-scored one in the aniline NH dihedral angle (see Fig 2C). The improvement, as shown in Table 2 and S12 Fig, for protocol B as compared with protocol A, is quite modest. Results are clearly better for the 16!45 and 45!16 perturbations, with the disagreement between experimental and calculated relative binding energy decreasing from 3.5 to 0.3 kcal mol -1 (compare S14 and S15 Figs), but ligand 44 is still an outlier. Although the experimental relative binding energy for the 45 ! 44 perturbation is just -0.1 kcal mol -1 , ligand 45 is predicted to bind much more strongly to ACK1 (calculated ΔΔG 45!44 are -2.0/-1.9 and -1.2/-1.6 kcal mol -1 for protocols A and B, respectively) than 44. This suggests possible deficiencies in the force field used for 44 in this study.
Protocols C and D, follow the same trends already explained for batch 1, pointing to an improvement in the results when a water molecule is included in the ATP-binding pocket ( Table 2). An encouraging R 2 of 0.76 ± 0.02 and an improvement in the underestimation of relative binding energies (slope 0.8) is obtained, though there is still room for improvements for affinity predictions for 44 and 16.

Thermodynamic cycle closures analysis
Hysteresis, being defined as the difference in binding energy between the forward and reverse perturbation [44,57,58], has been proposed as useful metric to identify problematic perturbations [59,60]. Cycle closures for both batch 1 and batch 2 were computed to determine whether incorrectly predicted binding poses could be detected in the absence of experimental binding affinity data. Results are shown in Table 3.
As could be expected, similar conclusions can be obtained when analyzing ring cycle closures or comparing forward and reverse perturbations, although there are some cases with high deviations between experimental and calculated relative binding energies, while exhibiting almost null hysteresis for the forward and reverse perturbations (i.e. the perturbations between ligands 2 and 5 in batch 1 and those between ligands 44 and 45 in batch 2).
Overall it appears that a threshold of ± 0.8 kcal mol -1 for cycle closure errors is useful to flag poses that need further attention even without prior knowledge of the experimental binding affinities. Thus, for protocol A, 3-4-7-6, 3-4-6, 3-4-7, 4-6-7, 2-6-5 and 45-16-44 thermodynamic cycle closures are indicative of problematic ligands. According to this metric, a significant improvement when using protocol B (only one thermodynamic cycle closure above the threshold) is seen, while a comparison between protocols A (6 cycles with hysteresis above the threshold) and C (4 cycles) suggest a modest improvement. Results for batch 2 clearly indicate

Discussion
This work has explored the viability of using alchemical free energy methods as a final filter in a cascade of computational methods for hit-to-lead virtual screens in the context of a dataset of ACK1 inhibitors. The two major limitations of AFE methods are the quality of the potential energy function used, and the extent to which the configurational sampling performed has captured relevant protein-ligand conformations [59,60] In principle sufficient long simulations will relax a protein-ligand complex to the ligand pose and protein conformation preferred by the force field used. However, because computing time is limited in practical scenarios, AFE simulations typically afford only a few ns per window, which can make the calculated binding affinities sensitive to the selection of the starting conformations. This work indicates that use of experimental data to bias the selection of poses and setup of binding site water content could lead to significant performance improvements. While the dataset studied here is small, the lessons from this study are likely applicable to other binding sites. Of course, as illustrated with ligand 4, even in cases where the MD simulations relax a previously modelled binding pose to one that closely resembles a pose inferred from X-ray data, the free energy calculations may still fail to reproduce the experimental binding affinities.
Careful analysis of literature structural data [52,61,62] was key to identify a conserved hydration site that was not modelled in the prior docking calculations. This knowledge was important to realize upon inspection of putative poses for some ligands in batch 1 the feasibility of a hydrogen bonding interaction via a bridging water molecule. Gratifyingly modelling of this hydration site leads to significant accuracy improvements for several perturbations.  Table 3. Calculated thermodynamic cycle closures. Cycle closures that exceed or equal a threshold of 0.8 kcal mol -1 are highlighted in bold.
Cycle closure (kcal mol -1 ) In principle, assuming an accurate potential energy function, these sampling issues could be dealt with by simply increasing the sampling time of the MD simulations. For the present dataset, we find that a one order of magnitude increase in sampling time was insufficient to bring about improvements in binding poses accuracy and binding site water content. Thus, at present it seems wise to pay attention to the starting conditions of the free energy calculations to maximize cost effectiveness. Where experimental data is lacking, a number of molecular modelling protocols have been proposed to determine location and energetics of important binding site water molecules [63][64][65][66][67][68][69][70][71][72].  1 and batch 2 with protocols A, B, C and D. The calculated values correspond to independent repeats. (TIF) S1 Table. R 2 , MUE (kcal mol -1 ) and slope obtained from the comparison of experimental a and predicted relative free energies of binding of batch 2.