Accurate Reproduction of 161 Small-Molecule Complex Crystal Structures using the EUDOC Program: Expanding the Use of EUDOC to Supramolecular Chemistry

EUDOC is a docking program that has successfully predicted small-molecule-bound protein complexes and identified drug leads from chemical databases. To expand the application of the EUDOC program to supramolecular chemistry, we tested its ability to reproduce crystal structures of small-molecule complexes. Of 161 selected crystal structures of small-molecule guest-host complexes, EUDOC reproduced all these crystal structures with guest structure mass-weighted root mean square deviations (mwRMSDs) of <1.0 Å relative to the corresponding crystal structures. In addition, the average interaction energy of these 161 guest-host complexes (−50.1 kcal/mol) was found to be nearly half of that of 153 previously tested small-molecule-bound protein complexes (−108.5 kcal/mol), according to the interaction energies calculated by EUDOC. 31 of the 161 complexes could not be reproduced with mwRMSDs of <1.0 Å if neighboring hosts in the crystal structure of a guest-host complex were not included as part of the multimeric host system, whereas two of the 161 complexes could not be reproduced with mwRMSDs of <1.0 Å if water molecules were excluded from the host system. These results demonstrate the significant influence of crystal packing on small molecule complexation and suggest that EUDOC is able to predict small-molecule complexes and that it is useful for the design of new materials, molecular sensors, and multimeric inhibitors of protein-protein interactions.


INTRODUCTION
In 1990, a computer was used to screen 10,000 chemicals in the Cambridge Structural Database (CSD) [1], leading to the identification of a haloperidol analog capable of inhibiting HIV-1 and HIV-2 proteases with a K i of <100 mM [2]. The screening was accomplished using a computer docking program, DOCK, that docked each chemical of the database into the active sites of the enzymes and evaluated the shape complementarity of the docked compound relative to the active sites. Inspired by this seminal work, the EUDOC program was devised to search for the specific conformations, positions, and orientations of two threedimensional (3D) structures that permit the strongest nonbonded intermolecular interactions between the two. The EUDOC program uses docking algorithms that differ from those of DOCK [3]. It addresses molecular flexibility by using conformation selection and conformation substitution mechanisms that enable massively parallel computing [3]. EUDOC was devised to perform on a cluster of more than 300 loosely connected processors [3] and has recently been ported to the IBM Blue Gene/L supercomputer [4,5]. This program has successfully predicted small-moleculebound protein complexes and identified drug leads from chemical databases [6][7][8][9][10][11][12].
The EUDOC program is also efficient. In a computational screen of 23,426 chemicals (at a resolution of 1.0 Å translation and 10u of arc rotation) for inhibitors of a chymotrypsin-like cysteine protease of the severe acute respiratory syndrome-associated coronavirus, the EUDOC program is able to reduce the wall-clock time of the screen from 242 minutes using 396 Xeon processors (2.2 GHz) on a Beowulf cluster to 13 and 7 minutes using 2048 and 4096 PowerPC-440 processors (700 MHz) on Blue Gene/L, respectively [4,5]. Because a large database can be divided into subsets, a sustained petaflops capability would be able to screen 23 million chemicals in about 10 minutes or to screen 20065000 billion chemicals for one drug target in a year [4]. This capability offers the possibility of identifying inhibitors that are effective enough for in vivo testing, eliminating the need of medicinal chemistry to improve the efficiency of inhibitor leads identified by terascale computers [4]. In the context of this promise, we seek to extend the application of the EUDOC program to supramolecular chemistry.
Supramolecular chemistry deals with creation of a large molecule assembled with noncovalent bonding among small molecular units, in contrast to organic synthesis that involves breaking and making covalent bonds to create a new molecule [13]. Such noncovalent bonding is reversible and comprises hydrogen bonding, metal coordination, hydrophobic force, van der Waals force, p-p interaction, cation-p interaction, and/or long-range electrostatic interaction to assemble small molecules into a multimolecular complex. Supramolecular chemistry principles have been used to develop new materials, molecular sensors, and multimolecular complexes designed to disrupt protein-protein interactions.
To expand the application of the EUDOC program to supramolecular chemistry, we tested its ability to reproduce the crystal structures of small-molecule guest-host complexes. Previously we had tested the ability of the program to reproduce crystal structures of proteins in complex with small molecules and found that EUDOC reproduced 97% of 154 crystal structures using the bound conformations of both proteins and their smallmolecule partners [3]. This success may not transfer to with smallmolecule guest-host complexes such as a crown ether in complex with 4-nitrobenzene-1,2-diamine, however, because the binding pocket or cavity in a small-molecule host is not as well formed as that in a protein.
Herein we report the results of our docking studies with 161 selected crystal structures of small-molecule guest-host complexes using the EUDOC program. These results show that the program is able to reproduce all 161 crystal structures and that the average interaction energy of these small-molecule complexes (250.1 kcal/mol) is nearly half of that of the 153 small molecule-bound protein complexes we studied in previous tests (2108.5 kcal/mol). The results also demonstrate the significant influence of crystal packing on small-molecule complex crystal structures and suggest that the EUDOC program is able to predict 3D structures of small-molecule guest-host complexes with reasonable reliability.

Docking without consideration of the influence of crystal packing or structural waters
The ability of EUDOC to reproduce crystal structures of smallmolecule guest-host complexes was evaluated with the following procedure. The guest and host molecules in the complex crystal structure were separated, and the guest structure was then docked back into the host structure by the EUDOC program. This docking process used translational and rotational increments of 1.0 Å and 10u of arc, respectively, and a docking box that was defined to enclose the guest structure in the guest-host complex crystal structure. Of many EUDOC-generated guest-host complexes, only the complex with the strongest interaction energy was compared to the corresponding crystal structure of the complex. In this comparison, the host portion of the EUDOC-generated complex was superimposed onto the host portion of the crystal structure, and the mass-weighted root mean square deviation (mwRMSD) of the guest portion between the two superimposed complexes was calculated. If the mwRMSD was ,2.0 or 1.0 Å , the crystal structure of the complex was reproduced or accurately reproduced, respectively, by the EUDOC program [3]. Because the uncertainty in calculating the interaction energy using the EUDOC program was estimated to be 0.7 kcal/mol [3], occasionally, a few EUDOC-generated complexes were considered to have the strongest interaction energy and compared to the crystal structure thus resulting in multiple mwRMSDs, if their interaction energies differed from the strongest interaction energy by #0.7 kcal/mol. In that case, as long as one of the mwRMSDs was ,2.0 or 1.0 Å , the crystal structure of the binary complex was reproduced or accurately reproduced, respectively, by the EUDOC program.
A total of 161 crystal structures of small-molecule guest-host complexes were obtained from CSD for this study [1]. The selection criteria included the followings: (1) no covalent bond between a host and a guest; (2) the R factor of ,15 to ensure good crystallographic quality; (3) a guest to host ratio of 1 in a unit cell; (4) no structures containing Ni +2 , Ag + , Pd +2 , Pt +2 , Au + or Ru +2 because force field parameters for these ions were unavailable in the EUDOC program. The results of the docking studies with the 161 guest-host complexes using the procedure described above are listed in Table 1. As apparent from the mwRMSD distribution listed in Table 2, the EUDOC program reproduced 93% and accurately reproduced 81% of the 161 complexes. The deviations between the EUDOC-generated and crystal complexes at different mwRMSD values are depicted in Figure 1.
Docking with consideration of the influence of crystal packing Figure 2 shows the difference (mwRMSD of 3.52 Å ) between the EUDOC-generated and crystal structures (CSD code: XAG-MAT). Complex XAGMAT is one of the 12 complexes that the EUDOC program failed to reproduce. Despite a favorable p-p interaction between the guest and host structures predicted by the EUDOC program, in the corresponding crystal structure the guest structure surprisingly docks at a region at which it partly interacts with the host via a p-p interaction (see Fig. 2). This discrepancy suggests that the guest might partly interact with host(s) and/or guest(s) in neighboring unit cells of the crystal structure. To confirm this, the docking study with complex XAGMAT was repeated with consideration of the influence of crystal packingnamely, the guest was docked into a multimeric host system that included neighboring host(s) and/or guest(s). These neighboring structures were generated by applying the symmetry of the space group of the crystal structure. The host(s) and/or guest(s) in neighboring unit cells were excluded if these structures were .4.0 Å away from the guest to be docked. Interestingly, when the influence of crystal packing was taken into account, the EUDOC program accurately reproduced complex XAGMAT with an mwRMSD of 0.07 Å , instead of the 3.52 Å obtained without consideration of crystal packing. This result prompted a new docking study that considered the influence of crystal packing.
The results of the docking studies with consideration of the influence of crystal packing are listed in Table 3. The 12 complexes (CSD codes: AJUXUY, ATUKEF, BAXZAB, BIF-KIK, CRAMCC10, FANJAG, GUGGUK, KOLMAZ, LAY-MAZ, NEBQOA, QAJKAN, and RALQAW01) that were not reproduced previously by the EUDOC program were accurately reproduced after the influence of crystal packing was taken into account. With consideration of the influence of crystal packing, the EUDOC program reproduced all 161 complexes and accurately reproduced 99% of them (see Table 2).

Docking with consideration of the influences of crystal packing and structural waters
Two crystal structures (CSD codes: XAQJAA and XAQJEE) could not be accurately reproduced by the EUDOC program even after consideration of the influence of crystal packing (XAQJAA: mwRMSD = 1.65 Å ; XAQJEE: mwRMSD = 1.89 Å ). Visual inspection of these structures revealed that the binding between the guest and host structures was mediated by crystallographically determined water molecules. This mediation suggested that, similar to the crystal packing, water molecules might also play an important role in guest-host complexation, and it might be necessary to include them in the multimeric host system for docking. Accordingly, the docking studies with the 161 complexes were repeated with consideration of the influences of both crystal packing and structural waters. The results are listed in Table 4. Indeed, the EUDOC program accurately reproduced complexes XAQJAA and XAQJEE with mwRMSDs of 0.03 and 0.27 Å , respectively. Taking into account the influences of both crystal packing and structural waters, the EUDOC program accurately reproduced all 161 complexes (see Table 2).

DISCUSSION
The influences of crystal packing and structural water on docking This study shows that 31 (19%) of the 161 complexes could not be accurately reproduced with mwRMSDs of ,1.0 Å by the EUDOC program if neighboring host(s) and/or guest(s) in the crystal structure were not included as part of the multimeric host system, whereas only 2 (1%) of these complexes could not be accurately reproduced with mwRMSDs of ,1.0 Å if neighboring structures were included but water molecules were excluded from the host system. These results show that the influence of crystal packing or crystal environment on crystal structures of guest-host complexes is significant, which is consistent with the reported influence of crystal packing on protein structures [14]. These results also show that the influence of structural waters on guesthost complex crystal structures is insignificant, which is consistent with our reported finding that complexation between a small molecule and a protein is not commonly mediated by water molecules [3]. This study therefore suggests that crystal packing should be taken into account when reproducing crystal structures of small-molecule guest-host complexes through docking studies, whereas water molecules, counterions or other companying molecules such as ethanol can be excluded from the host system. This study also suggests that, ideally, to perform prospective and accurate docking of a small molecule into another small molecule,  repetitive docking of a guest or host into a guest-host complex that is generated by the previous docking is preferred, because a host is sometimes too small to prevent a guest from interacting with nearby guest(s) and/or host(s). It is worth noting, however, that the success rate of docking a small molecule into another small molecule is about 93% if the influence of crystal packing is ignored.

Demonstration of the accuracy of the nonbonded parameters of the AMBER force field
In the crystal structure of Rebek's acridine diacid in complex with quinoxaline (CSD code: YAWJIP) there are two crystallographically independent forms of the complex in the asymmetric unit [15]. This structure was used to benchmark the all-atom AMBER/OPLS force field [16]. In this study, complex YAWJIP was accurately reproduced by the EUDOC program using the nonbonded force field parameters of the second-generation    [3,17]; the mwRMSDs of the guest position between the EUDOC-generated and crystal complexes for forms A and B are both 0.16 Å . The accurate reproduction of the remaining crystal structures (see Table 2) further demonstrates the accuracy of the nonbonded force field parameters of the secondgeneration AMBER force field (parm99.dat) for reproducing crystal structures [3,17].

Application to supramolecular chemistry
The above results suggest that the EUDOC program can predict small-molecule guest-host complexes with a reasonable success rate (93%), without consideration to the mechanism of small-molecule complex aggregation-namely, without being given the multimeric host system. To demonstrate this ability herein, the crystal structure of complex YAWJIP is used as a model system. Based on the NMR spectroscopic data of complex YAWJIP, the guest structure quinoxaline was proposed to have face-to-face pstacking with the acridine portion of Rebek's acridine diacids in an early report of the complex [18]. However, the face-to-face pstacking was found in neither the crystal structure of complex YAWJIP [15] nor the Monte Carlo statistical mechanics calculations of the complex [16].
To perform a prospective docking study, the two-dimensional structures of quinoxaline and Rebek's acridine diacid were The van der Waals component of the intermolecular interaction energy; 4 The electrostatic component of the intermolecular interaction energy. 5 EUDOC identified one alternative binding mode that is energetically indistinguishable from the binding mode of the crystal structure.   , IN), respectively. Both 3D structures were refined with energy minimization monitored with a normalmode (NMODE) analysis to ensure that the energy minimization stopped when the minimized conformation reached a local potential energy minimum. The energy minimization was performed by using the SANDER module of the AMBER5 program [19] with the second-generation AMBER force field [17], and the NMODE analysis was carried out using the NMODE module of the AMBER8 program [19]. Given the refined 3D structures of quinoxaline and Rebek's acridine diacid, the EUDOC program generated a complex nearly identical to the crystal structure of form A (mwRMSD: 0.7 Å ) but not the proposed complex with nearly face-to-face p-stacking. This perspective docking result suggests that the EUDOC program is a useful tool for predicting 3D models of guest-host complexes to aid the design of new molecular entities according to the principles of supramolecular chemistry. Visual inspection of 154 reported crystal structures of proteins in complex with small molecules [3] and the 161 crystal structures of guest-host complexes reported herein suggested that the noncovalent interactions of the guest-host complexes are in general weaker than those of the protein complexes. The average of the interaction energies of the guest-host complexes listed in The van der Waals component of the intermolecular interaction energy; 4 The electrostatic component of the intermolecular interaction energy; 5 Structural water molecules were present in the multimeric host system. 6 EUDOC identified one alternative binding mode that is energetically indistinguishable from the binding mode of the crystal structure.    Table VI (excluding 1PHA) of reference 3 (2108.5 kcal/mol) quantitatively confirm the relatively weak noncovalent interactions of the guest-host complex. This confirmation suggests that to design high-affinity guest-host complexes it is of advantage to incorporate the entropic energy into the binding, because the number of functional groups that can be introduced onto the guest and host structures to confer the nonbonded interactions is limited by the size of the two partners. This ''saturation'' problem is more apparent for small-molecule complexes than for protein complexes. It is therefore conceivable that the EUDOC program is also a useful tool for estimating the interaction energies of guest-host complexes to aid the design of new molecular entities according to the principles of supramolecular chemistry.

Preparation of the host and guest structures
The guest and host structures were taken from the crystal structures of their corresponding complexes obtained from CSD [1]. Water molecules, counterions, and solvent molecules such as ethanol were removed from the guest or host structure. Hydrogen atoms were added by using the QUANTA97 program (Accelrys Software, Inc, San Diego, California) followed by energy minimization of the hydrogen atoms using the SANDER module of the AMBER5 program [19] with the second-generation AMBER force field (parm99.dat) [17] and a positional constraint on all non-hydrogen atoms. The protonation states of the guest and host structures shown in Figures S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15 and S16 of Supporting Information were determined according to pKa values of functional groups at pH of 7.0. The atomic charges of the guest and host structures listed in Table S1 of Supporting Information were generated according to the RESP procedure [20] with ab initio calculations at the HF/6-31G* level using the Gaussian03 program [21]. The AMBER atom types of the guest and host structures listed in Table S1 of Supporting Information were assigned by the ANTECHAMBER module of AMBER7 [19].

Docking studies using the EUDOC program
The algorithm of the EUDOC program has been reported elsewhere [3]. Briefly, it uses a systematic search protocol, translating and rotating a guest in a putative binding pocket of a host to search for energetically favorable orientations and positions of the guest relative to the host. A docking box is defined within the binding pocket to confine the translation of the ligand. The intermolecular interaction energy is the potential energy of the guest-host complex relative to the potential energies of the two partners in their free state. This energy was calculated according to equations 1 and 2 using the second-generation AMBER force field [17].
In calculating the intermolecular interaction energy, the dielectric constant was set to 1.0, and the distance cutoffs for steric and electrostatic interactions were set to 10 9 Å . A docking box was defined to enclose the guest structure in the crystal structure of the guest-host complex. The size of the docking box and the cutoff for the interaction energy used by the EUDOC program are listed in Table S2 of Supporting Information. The complex-prediction module of the EUDOC program (Version 41, executable available from YPP) was used to translate and rotate the guest around the host at increments of 1.0 Å and 10u of arc, respectively, unless noted otherwise in Table 1.
To consider the influence of crystal packing, the PyMOL program (DeLano Scientific LLC, South San Francisco, California) was used to generate a multimeric host system by applying the symmetry of the space group of the crystal structure. The host or guest structure was excluded from the multimeric host system if the shortest distance between a heavy atom of the guest structure to be docked and the heavy atom of the host/guest structure in neighboring unit cells was .4.0 Å .

Mass-weighted root mean square deviations
The mwRMSDs were calculated by superimposing the host portion of the EUDOC-generated complex over the corresponding host portion of the crystal structure followed by a calculation for the mwRMSD of all atoms of the guest portion in the two superimposed complexes using the PTRAJ module of AMBER8 [19].