Challenges Predicting Ligand-Receptor Interactions of Promiscuous Proteins: The Nuclear Receptor PXR

Transcriptional regulation of some genes involved in xenobiotic detoxification and apoptosis is performed via the human pregnane X receptor (PXR) which in turn is activated by structurally diverse agonists including steroid hormones. Activation of PXR has the potential to initiate adverse effects, altering drug pharmacokinetics or perturbing physiological processes. Reliable computational prediction of PXR agonists would be valuable for pharmaceutical and toxicological research. There has been limited success with structure-based modeling approaches to predict human PXR activators. Slightly better success has been achieved with ligand-based modeling methods including quantitative structure-activity relationship (QSAR) analysis, pharmacophore modeling and machine learning. In this study, we present a comprehensive analysis focused on prediction of 115 steroids for ligand binding activity towards human PXR. Six crystal structures were used as templates for docking and ligand-based modeling approaches (two-, three-, four- and five-dimensional analyses). The best success at external prediction was achieved with 5D-QSAR. Bayesian models with FCFP_6 descriptors were validated after leaving a large percentage of the dataset out and using an external test set. Docking of ligands to the PXR structure co-crystallized with hyperforin had the best statistics for this method. Sulfated steroids (which are activators) were consistently predicted as non-activators while, poorly predicted steroids were docked in a reverse mode compared to 5α-androstan-3β-ol. Modeling of human PXR represents a complex challenge by virtue of the large, flexible ligand-binding cavity. This study emphasizes this aspect, illustrating modest success using the largest quantitative data set to date and multiple modeling approaches.

well as the large number of compounds being analyzed, the best alignment option was not immediately apparent to us. Common substructure alignment with an inertial grid orientation was attempted for the training sets using different template molecules. The final alignments were picked based on the quality and plausibility of the actual alignment as well as the statistical quality of the QSAR model derived from it. The best alignments of the master training set (N = 95), the subsets of pregnanes (N = 23) and bile acids /salts (N = 41) were achieved using the conformation of pregnanedione (compound # 27,Supplemental  For CoMFA, all the molecules were placed in a 3D lattice with regular grid points separated by 2 Å. The van der Waals potentials and the Coulombic term representing the steric and electrostatic fields were calculated using the standard Tripos force field for CoMFA. A C sp3 atom with a formal charge of +1 and a van der Waals radius of 1.52 Å served as probe atom to generate steric (Lennard-Jones 6-12 potential) and electrostatic (Coulombic potential) field energies, which were obtained by summing the individual interaction energies between each atom of the molecule and the probe atom at every grid point. A distance-dependent dielectric constant was used. The steric and electrostatic fields were truncated at ± 30.00 kcal/mol. A similar approach was used for CoMSIA as the aligned molecules were placed in a 3D lattice with regular grid points separated by 2 Å. The five physicochemical properties for CoMSIA (steric, electrostatic, hydrophobic, hydrogen-bond donor and acceptor) were evaluated using a common probe atom with 1 Å radius, +1.0 charge, and hydrophobic and hydrogen-bond property values of +1. The attenuation factor α, which determines the steepness of the Gaussian function, was assigned a default value of 0.3 (39). The PLS technique was employed to generate a linear relationship that correlates changes in the various computed potential fields with changes in the corresponding experimental values of activities (-log IC 50 ) for the data set. Employing the CoMFA and CoMSIA potential energy fields for each molecule as the independent variable and the corresponding activity values as the dependent variable, PLS converts these descriptors to the so-called latent variables or principal components (PCs) consisting of linear combinations of the original independent variables.
To assess the internal predictive ability of the CoMFA and CoMSIA models, the 'leave-one-out' (LOO) cross-validation procedures was employed. Cross-validation determines the optimum number of PCs, corresponding to the smallest error of prediction and the highest q 2 . PLS analysis was repeated without validation using the optimum number of PCs to generate final CoMFA and CoMSIA models from which the conventional correlation coefficient r 2 was derived. The utility of the 3D-QSAR models were determined by predicting the activities of the test set compounds that were not included in the training sets after aligning in the same way as those in the training set.

In silico methodology: 3D-QSAR -Catalyst
The pharmacophore modeling studies were carried out using Catalyst in Discovery Studio version 1.7 (Accelrys, San Diego, CA) running on a Sony Vaio laptop computer (Intel Pentium M processor). This methodology has been previously described [2]. Molecules were imported as an sdf file and the 3-D molecular structures were produced using up to 255 conformers with the best conformer generation method, allowing a maximum energy difference of 20 kcal/mol. Ten hypotheses were generated using these conformers for each of the molecules and the EC 50 values, after selection of the following features: hydrophobic, hydrogen bond acceptor, hydrogen bond donor and ring aromatic features. In addition, hypotheses were generated with up to 2 excluded volumes, variable weight and tolerances and a combination of excluded volumes and variable weight and tolerances. In all cases, after assessing all ten generated hypotheses, the one with lowest energy cost was selected for further analysis as this usually possessed features representative of all the hypotheses. The quality of the structure activity correlation between the estimated and observed activity values was estimated by means of an r value.
As Catalyst is commonly used with relatively small training sets (greater or equal to 16 molecules) we generated individual models for the different types of steroids only.

In silico methodology: 4D-QSAR
The 4D-QSAR methodology has been presented previously in detail [3]. Briefly, the commercial version (V3.0) of the 4D-QSAR package was employed in this study (4D-QSAR, Version 3.0; The Chem21 Group, Inc., Lake Forest, IL). This study uses a receptor-independent (RI-4D-QSAR) analysis. The first step in the analysis is to generate a reference grid cell lattice in which to place the 3D structure of each training set compound. This grid cell lattice is composed of a set of one angstrom cubes. The 3D structures of the training set compounds were then constructed and optimized in Hyperchem (Release 7.51 for Windows; Hypercube, Inc. Gainesville, FL) The preferred compound geometry was determined via molecular mechanics with an MM+ force field, and the partial charges were assigned using a semiempirical AM1 method implemented in the Hyperchem program [3].
The interaction pharmacophore elements, or IPEs were assigned to the intramolecular energy minimized 3D structure of each compound and the conformational ensemble profile, or CEP, was then generated for each training set compound. The seven IPEs used in 4D-QSAR analyses represent any/all atoms, non-polar atoms, polar positive atoms, polar negative atoms, hydrogen-bond acceptor atoms, hydrogen-bond donor atoms, aromatic atoms and non-hydrogen atoms. A molecular dynamics simulation (MDS) was used to create the CEP. The MOLSIM V3.0 (C. Doherty and The Chem21 Group, Inc., Lake Forest, IL) software package with the extended MM2 force field was utilized to perform the MDS. The molecular dielectric was set to 3.5, and the simulation temperature was fixed at 300 K. A sampling time of 100 ps was employed, over which a total of 1000 conformations of each compound were recorded. The CEP was created by recording the atomic coordinates and conformational energy every 0.1 ps throughout the simulation, resulting in 1000 "snapshots" of each compound as it traverses through the set of thermodynamically available conformer states.
Following generation of the CEP of each compound, the molecular alignments were chosen for the training set. Three-ordered atom alignment rules were used in this study. In general, the alignments are chosen to span the common framework (core) of the molecules in the training set so that information relating to the substituent properties of the compounds is obtained from the resulting models. This alignment strategy is reflected in those which were chosen and listed along with the steroidal core structure in Supplemental Table 7.
All conformations from the CEP of every compound are placed in the grid cell lattice space according to a selected trial alignment. The occupancy of the grid cells by each IPE type is recorded over the CEP which then forms the set of grid cell occupancy descriptors, or GCODs which are utilized as the pool of trial descriptors in the model building and optimization process. The genetic function approximation (GFA) is used to optimize the 4D-QSAR models [4].
Since GFA typically generates a family of possible models, the best models in the 4D-QSAR study were chosen based on a number of different criteria. In addition to the leave-one-out cross-validated correlation coefficient, or q 2 , other statistical measures such as r 2 , standard error (SE), and lack-of-fit (LOF) were considered as indicators of model fitness [4]. The optimal number of descriptor terms to include in the best model was determined by plotting the number of model terms versus the cross-validated correlation coefficient (data not shown). The point of the plot at which the q 2 did not significantly increase with the addition of an additional model term was chosen as the optimal number of model terms. Test sets not included in the training sets were also used to evaluate the predictive power of the 4D-QSAR models.
The active conformation of each of the compounds in the training sets was postulated relative to the best 4D-QSAR model for the respective set. This was accomplished by first determining the conformations of the CEP that are within a threshold energy limit of 5 kcal, i.e., only thermodynamically accessible conformations are considered, and then determining which of these possible conformations has the highest activity as predicted by the model.