Spherical Harmonics Coefficients for Ligand-Based Virtual Screening of Cyclooxygenase Inhibitors

Background Molecular descriptors are essential for many applications in computational chemistry, such as ligand-based similarity searching. Spherical harmonics have previously been suggested as comprehensive descriptors of molecular structure and properties. We investigate a spherical harmonics descriptor for shape-based virtual screening. Methodology/Principal Findings We introduce and validate a partially rotation-invariant three-dimensional molecular shape descriptor based on the norm of spherical harmonics expansion coefficients. Using this molecular representation, we parameterize molecular surfaces, i.e., isosurfaces of spatial molecular property distributions. We validate the shape descriptor in a comprehensive retrospective virtual screening experiment. In a prospective study, we virtually screen a large compound library for cyclooxygenase inhibitors, using a self-organizing map as a pre-filter and the shape descriptor for candidate prioritization. Conclusions/Significance 12 compounds were tested in vitro for direct enzyme inhibition and in a whole blood assay. Active compounds containing a triazole scaffold were identified as direct cyclooxygenase-1 inhibitors. This outcome corroborates the usefulness of spherical harmonics for representation of molecular shape in virtual screening of large compound collections. The combination of pharmacophore and shape-based filtering of screening candidates proved to be a straightforward approach to finding novel bioactive chemotypes with minimal experimental effort.


Introduction
Ligand-based virtual screening [1,2], quantitative structureproperty and structure-activity relationships [3,4], and other concepts in computational medicinal chemistry are based on the similarity principle [5], which states that (structurally) similar compounds generally exhibit similar properties. Such methods require quantitative representations of molecules, usually in the form of chemical descriptors, i. e., computable numerical attributes in vector form [6].
Spherical harmonics have been used in cheminformatics as a global feature-based parametrization method of molecular shape [28][29][30][31][32][33][34][35][36][37][38]. Their attractive properties with regard to rotations make them an intuitive and convenient choice as basis functions when searching in a rotational space [31]. A review article by Venkatraman et al. [38] highlights applications of spherical harmonics to protein structure comparison, ligand binding site similarity, protein-protein docking, and virtual screening. Jakobi et al. [37] use spherical harmonics in their ParaFrag approach to derive 3D pharmacophores of molecular fragments. Recently, Ritchie and co-workers have applied the ParaSurf and ParaFit methodologies [32,33] (Cepos InSilico Ltd., Erlangen, Germany) in a virtual screening study on the directory of useful decoys (DUD) data set [39], which motivates 3D shape-property combinations specifically for flexible ligands [40]. The DUD data set was also used in a comparative analysis of the performance of various shape descriptors alone and in combination with property and pharmacophore features [41]. See the section on related methods for further discussion of spherical harmonics approaches.
In this work, we introduce a partially rotation-invariant descriptor of molecular shape based on spherical harmonics decomposition coefficients. The idea is to decompose the molecular surface using spherical harmonics and to use the norm of the decomposition coefficients as a description of molecular shape. In this, we take advantage of the fact that the norm of the coefficients does not change under rotation around the z-axis, which we align to the primary axis of the molecule. We retrospectively evaluate our descriptor, and prospectively apply it to screen for novel inhibitors of the enzymes cyclooxygenase-1 (COX-1) and cyclooxygenase-2 (COX-2). Particular focus is on the practical application of the virtual screening technique as an evaluation of its actual suitability for early-phase drug discovery.

Spherical harmonics
Let l[f0,1,2, . . .g, let jmjƒl, let (h,w) indicate spherical coordinates, and let P m l denote the Legendre polynomials [42]. The spherical harmonics [43] Y m l of order l (frequency, angular quantum number) and degree m (azimuthal quantum number), form an orthonormal (with respect to integration over the unit sphere) and complete set of basis functions ( Fig. 1). They are solutions to Laplace's differential equation + 2 Y~0 in spherical coordinates [44]. Any square-integrable spherical function f (h,w) can be decomposed as with complex coefficients c m l . The spherical harmonics decomposition can be viewed as a generalization of the Fourier decomposition to three dimensions [45].
The coefficients c m l of an harmonic expansion can be found using the orthonormality property. Multiplying each side of Eq. 2 by the complex conjugate Y m l (h,w) and integrating over the sphere yields Small values of l correspond to low frequencies, and describe the overall low-resolution shape; higher values of l add finer, high-frequency detail. The coefficients are unique, and can therefore be used as feature vectors for shape description. Rotation of a molecule (its shape function f ) changes the coefficients. A conventional solution is to define a canonical orientation of the molecule. For the purpose of shape comparisons, this implies an alignment of the compared molecules, with all associated problems and computational requirements. As an alternative, we use a partial orientation in conjunction with certain rotational invariance properties of the coefficients.

Descriptor definition
Let X~(x x 1 , . . . ,x x n ) T [R n|3 denote the Cartesian coordinates of pointsx x 1 , . . . ,x x n [R 3 sampled from a molecular surface. We assume that the surface is ''star-like'' (single-valued) in the sense that rays radiating outward from the molecule's origin intersect the surface only once (this is more of an issue for proteins; as argued elsewhere [35], small molecules are little, if at all, affected). Let Lemma. The p-norm of the rows of C does not change under rotation around the z-axis (polar axis, change in w).
After a rotation around the z-axis (a change in w), the same holds for the rotated pointx x 0~( x',y',z') and its coefficientc c 0 , i.e., jjc c 0 jj p~j dj {1 jjx x 0 jj p . Since the rotation matrix is unitary, jjx xjj p~j jx x 0 jj p , and it follows that jjc c jj p~j jc c 0 jj p . Our spherical harmonics descriptor is the k-component vector of the norms of the coefficients. It is a description of molecular shape that is invariant to rotations around the z-axis.
Before spherical harmonics decomposition, we place molecules into a common frame of reference by translating their center of gravity to the coordinate system origin and by aligning their first principle component (the direction of maximum variance as given by principle component analysis [47]) with the z-axis. In other words, we align molecules according to their longest spatial extent, and then apply our descriptor which is invariant to rotations around the z-axis.

Descriptor computation
Gaussian contact surfaces [48] of all compounds were computed using MOE (Molecular Operating Environment, version 2009.10, Chemical Computing Group Inc., Montreal, Canada, www. chemcomp.com ). Spherical harmonics decomposition was then carried out on the vertices of these surfaces, giving approximate coefficients [49]. To limit computational expense, we truncated spherical harmonics expansions after order L~9. The resulting k~(9z1) 2~1 00 decomposition coefficients were sufficient to represent fine molecular detail and approximately reconstruct the original molecular surfaces (Fig. 2). The partial rotational invariance of the coefficient norms jjc m l jj is demonstrated in Fig. 3. Computation was done in Matlab (The MathWorks, version R2007a, www.mathworks.com ), partly based on code by Dr. Andrew Hanna (University of East Anglia, United Kingdom, www.cmp.uea.ac.uk/,aih ). Average computing time was v3 seconds per compound, which is acceptable for medium-sized libraries but will require speed-up for high-throughput virtual screening.

Related methods
Spherical harmonics have been widely used in cheminformatics as a global feature-based parametrization method of molecular shape [28][29][30][31][32][33][34][35][36][37][38]. Most current approaches, including ours, use the center of gravity as the center of the spherical harmonics decomposition. Molecular surface sampling can be done by sampling iso-probability surfaces of molecular property densities. One aspect in which methods differ is the way they deal with rotations in 3D space.
Ritchie and Kemp [31] apply the rotational property of spherical harmonics (a rotation of the surface can be simulated by rotating the expansion coefficients) to maximize the pairwise superposition of two molecules. The software ParaSurf superposes molecules using a brute-force rotational search over the three Euler rotation angles [50]. In a recent publication, Cai et al. [36] use a similar approach to obtain the minimal root-mean-square distance between a ligand molecule and a target protein. In these related studies, molecular surfaces were rotated by transforming their expansion coefficients.
Standard orientation of compounds prior to spherical harmonics decomposition was proposed by Morris et al. [34]. Their work registered molecules and binding pockets in a standard frame by translating their center of mass to the coordinate origin and aligning their variance-covariance matrix to the axes of the coordinate system. They then use the coefficients of a real spherical harmonics expansion to describe and compare the molecular shape of binding pockets and ligands. This approach aligns molecules to minimize rotation-dependent differences in the coefficients.
Rotation-invariant spherical harmonics descriptors were applied by Kazhdan et al. [28] and Mavridis et al. [35,51], using the fact that expansion coefficients of the same order l transform among themselves to construct rotationally invariant spherical harmonics coefficients jj P l m~{l c m l jj 2 . In their approach, coefficients of the same order l are binned together, thereby losing information contained in the individual degrees m, but gaining complete rotational invariance.
In this work, we combine partial orientation of the molecules with the magnitude of the expansion coefficients as a partially rotation-invariant shape descriptor. Our proposed descriptor retains more information than the spherical harmonics descriptors by Kazhdan et al. [28] and Mavridis et al. [35,51] in the sense that coefficients within the same order are not summed up, but kept. Compared with standard orientation methods, our descriptor is potentially less susceptible to problems in the orientation step than most others because only the first (and most stable) principle component is used for orientation.

Retrospective evaluation
For retrospective validation, we ranked the compounds in a database according to their similarity to a reference compound, as measured by Euclidean distance and our descriptor. Two conceptually different collections of reference data were used, the DUD data set (release 2, from http://dud.docking.org/r2 , unmodified data) [39], and the COBRA data set (version 10.3, 11 244 compounds annotated with activity on a total of 677 individual macromolecular targets) [52]. COBRA 10.3 contains 168 COX-2 inhibitors.
We used the selective COX-2 inhibitor SC-558 and the nonselective inhibitor indomethacin as queries for ligand-based similarity searching, with the conformations extracted from the crystal structure (protein data bank [53] identifiers (PDB ID) 6cox [54] and 4cox [54]). Enrichment factors [55], receiver operating characteristic curves (ROC curves [56]), and the area under these curves (ROC AUC) were used as performance measures.

Prospective virtual screening
We screened the ChemBridge compound pool (457 226 compounds, ChemBridge Corp., San Diego, USA, www.chembridge.com) for potential COX ligands using a single CORINA conformer query as in the retrospective screening. The database was preprocessed using the ''washing'' procedure in MOE (protonation of strong bases and de-protonation of strong acids; all other parameters were kept at their default values).
To reduce computational effort and allow for pharmacophore feature-based compound ranking, the screening compound pool was pre-filtered using a self-organizing map (SOM [57]) trained on the ChemBridge collection and 275 COX-1 and COX-2 inhibitors from the COBRA database. SOM topology was toroidal with 20|20 neurons (1 200 molecules per neuron on average); compounds were represented using the 150-dimensional CATS2D topological pharmacophore descriptor [58,59] and compared using the Manhattan distance. The initial width of the Gaussian neighborhood function was set to 5; training was terminated after 5 : 10 6 steps (using each compound 10 times on average). We used the MOLMAP software tool for SOM generation [60].
After pre-filtering, 21 950 compounds of the ChemBridge database that were similar to the COX inhibitors from the COBRA database were retained for virtual screening using our spherical harmonics shape descriptor. Two potent COX inhibitors served as reference molecules (queries; Fig. 4). All parameters were set to the values used in retrospective virtual screening. The spherical harmonics descriptor was calculated for the 21 950 retained molecules and for the two reference molecules.

Enzyme inhibition assay
Inhibition of COX-1 (ovine) and COX-2 (human recombinant) activity was measured using a COX inhibitor screening assay kit (Cayman Chemicals, Ann Arbor, MI, USA, www.caymanchem. com ), according to the manufacturer's protocol. SC-560, a selective COX-1 inhibitor, and celecoxib, a selective COX-2 inhibitor, served as positive controls. The COX inhibitor screening assay directly measures the amount of prostaglandins PGE 2 , PGD 2 and PGF 2a produced by SnCl 2 reduction of COXderived PGH 2 . In addition to this protocol, the amounts of prostaglandins were quantified by LC-MS/MS analysis as described previously [61].
Whole blood assay COX-1 whole blood assay. One-milliliter heparinized human blood samples were incubated with 4 ml test substance (in DMSO) or 4 ml DMSO (control) for 10 min at 37 0 C . After this, thrombocyte aggregation was stimulated by addition of calcium ionophore A23187 (50 mM) for 30 min at 37 0 C. Plasma was separated by centrifugation for 20 min at 2000rpm, 4 0 C and kept at {80 0 C until assayed for TXB 2 by LC-MSMS (see below).
COX-2 whole blood assay. For the determination of COX-2 activity, 1ml of heparinized human blood was incubated at 37 0 C with 10 ml of acetylsalicylic acid (1mgml {1 in PBS), 4 ml DMSO or 4 ml inhibitor (in DMSO) for 10 min. After this, 2ml of LPS (5mgml {1 in DMSO) was added and incubated for 24 hr at 37 0 C. The reaction was terminated by quickly chilling on ice. Plasma was separated by centrifuging (20 min, 2 000rpm, 4 0 C),

Results and Discussion
We validated our spherical harmonics (SpH) descriptor in a retrospective setting (statistical validation on known data), and in a prospective study to obtain biochemical confirmation of our model.

Retrospective evaluation
As a first analysis, we used the DUD compound collection for a preliminary comparison of selected shape-and structure-based virtual screening methods. ROC AUC [62] values were computed  for each of the methods compared. ROC AUC values lie in the interval ½0,1, with values closer to 1 indicating higher enrichment of actives in a ranked list of compounds. The analysis was limited to the original COX-2 data from DUD (426 actives, 13 289 decoys). We did not perform exhaustive comparative analyses of virtual screening performance or focus on 'early recognition' of actives [63,64], as the primary purpose of this study was to determine whether our SpH descriptor might be a useful shapebased filtering criterion for COX inhibitors. Retrospective screening was restricted to COX-2, our original target. Table 1 summarizes the results obtained for CATS2D (topological pharmacophore descriptor [58]), LIQUID (threedimensional pharmacophore descriptor using Gaussian feature points; v1: hydrogen-bond donors, hydrogen-bond acceptors, lipophilic [65]; v2: additional aromatic, positive and negative charge features (manuscript in preparation)), PRPS (Gaussian pseudoreceptor model [66,67]), ShaEP (field-based subgraph matching [68]), and ROCS (Gaussian shape model [15,69]). For the DUD COX-2 data, ROC AUC values indicate better than random performance for all methods. SpH yielded an average of 0.86, which compares to Ritchie's ParaFit spherical harmonics descriptor (note that the ParaFit ROC AUC value is not given in the original publication; we estimated it from graphical material provided in the article's supplementary material [41]). Among the tested methods, SpH performed best for the selective COX-2 inhibitor SC-558 (Fig. 4) yielding a ROC AUC = 0.91. Notably, high values were also obtained for indomethacin (Fig. 4), a nonselective COX inhibitor (COX-1 IC 50 = 18 nM; COX-2 IC 50 = 26 nM) [70]. Apparently, only the PRPS pseudoreceptor model distinguished between the selective (ROC AUC = 0.83) and the non-selective (ROC AUC = 0.15) query.
In contrast to DUD (unmodified data), the COBRA database contains only druglike bioactive compounds. Ranking of the COBRA database with SC-558 as query resulted in an enrichment factor (computed for the first percentile) of 23. We compared this result to those obtained by the shapelets [71] method from our group, using the same version of the COBRA database and the same reference structure. The shapelets shape-only virtual screening method achieved a comparable enrichment factor of 24. ROC curves are presented in Fig. 5 (numbers for shapelets refer to COBRA version 8.4 containing 8 311 compounds including 136 COX-2 inhibitors).  In summary, our spherical harmonics coefficients-based approach SpH achieves notable enrichment of actives and seems suitable for COX-2 inhibitor retrieval. This outcome is in agreement with the study of shape-based virtual screening approaches by Ritchie et al. [41], who report high hit rates for COX-2 using shape descriptors. We conclude that spherical harmonics-based decomposition of molecular shape captures structural features that are relevant for virtual screening. Due to the limited number of published prospective applications [72], it seems premature to render any conclusion regarding certain implementation preferences or 'best-in-class' spherical harmonics methods. To further assess our SpH approach, we performed a prospective study using SpH in a virtual screening cascade with the aim to identify new COX inhibitors.
In the second virtual screening step, SpH was used for shapebased filtering. Two reference molecules (SC-558 and indomethacin; Fig. 4) resulted in two ranked lists of the pre-filtered ChemBridge compounds. 10 duplicates were found among the 50 top-ranking compounds from the two lists (20% overlap). In total, 12 compounds were selected by visual inspection, preferring potentially new scaffolds ('cherry-picking', Fig. 7), and submitted for activity determination in a direct enzyme inhibition and a whole blood assay.

Assay results
We determined the COX-inhibitory activity of 12 compounds by performing a commercially available competitive COXinhibition assay using purified COX-1 (ovine) and COX-2 (human recombinant) enzymes. Compounds 5 and 9 inhibit COX-1 in a concentration dependent manner ( Fig. 8 and Table 2). At 100mM compounds 5 and 9 inhibit COX-1 activity to 72 % and 89 %, respectively. Both compounds have only marginal effects on COX-2-activity at concentrations up to 100mM. All other substances have no effect on COX-1 or COX-2 activity in this in vitro assay. While this outcome supports our general virtual screening approach, we failed to retrieve COX-2 inhibitors. This might be a consequence of using the selective COX-2 inhibitor SC-558 in combination with the non-selective COX inhibitor indomethacin as queries for the spherical harmonics shape filter. Apparently, the COX activity island on the SOM and SpH consensus filtering eliminated COX-2 specific features. It is also possible that there were no hitherto unidentified COX-2 ligands in the compound pool.
In the whole blood assay (Fig. 9, Table 3), compounds 5 and 9 are less effective, with maximum COX-1 inhibition of about 30 % and no COX-2 inhibitory efficacy. Interestingly, in this assay, compounds 6, 10, 2 and 8 inhibit TXB 2 production in a concentration dependent manner up to 72 %, 72 % and 61 % at 100mM, respectively. Compounds 6 and 10 have only marginal inhibitory potency on PGE 2 production, which points to selective COX-1 inhibitors in vivo. Compound 2 also inhibits PGE 2 production comparable to TXB 2 , indicating that this compound is a COX-unselective inhibitor. In contrast, substance 8 increases the PGE 2 amount in a concentration dependent manner, which argues for an activator of PGE 2 production in the cellular context. All other compounds show only very weak or no effect on PGE 2 production.
The inhibitory data obtained from the whole blood assay might be meaningful for further hit optimization. Compounds that are active in this assay are not snatched away by binding to serum albumin, but cross the cell membrane and overcome possible interactions with cellular substances or enzymes. This could explain why compounds 5 and 9 are active in the enzyme assay, but inactive in the whole blood assay. In contrast, compounds 6, 10, 2 and 8, which were more active in the whole blood assay, possibly interact with the arachidonic acid pathway in other ways than direct inhibition of COX-1 or COX-2. Also, these compounds might be metabolized by cellular enzymes to more active derivatives, but this hypothesis needs to be tested by further experiments. Compound 8 is of special interest, as it induces PGE 2 production up to 322 %. This increase could be due to an activation of enzyme activity, possibly by binding to the ''inactive'' monomer of the COX-homodimer complex [73,74], or, due to an enhancement of COX-2 protein, either by transcriptional or posttranscriptional mechanisms.
As a preliminary novelty check, similarity searches were performed using SciFinder Web (2010-10-21) for data retrieval from the CAS database (Chemical Abstracts Service, Columbus, Ohio, USA; www.cas.org). For none of the actives any reference to COX inhibition was found, and only for compound 9 substructure matches (lacking the meta methyl group) were retrieved with regard to bioactivities other than COX inhibition. It is therefore reasonable to conclude that COX inhibition by compounds 5 and 9 represents a novel finding resulting from our study. We did not perform additional analytical investigations of compound integrity and purity other than those provided by the compound supplier. Therefore, we cannot exclude that the activities measured in the assays might be partially owed to decomposition or oxidation products. Analog compound design and testing will be mandatory.

Conclusions
We presented a favorable retrospective evaluation of the SpH approach using COX-2 data from the DUD collection, and in a first prospective application demonstrated the usefulness of the descriptor in combination with a self-organizing map for retrieving bioactive ligands from a large compound pool. Although we did not retrieve a potent COX-2 inhibitor, which is likely owed to the setup of the virtual screening cascade, two novel COX-1 inhibitors were discovered. Future research will have to focus on mathematical descriptions of molecular shape that allow for enzyme subtype-selective ligand screening.
We introduced the magnitude of spherical harmonics coefficients as a partially rotation-invariant descriptor of molecular shape. In retrospective validation on the DUD dataset, the performance (as estimated by ROC AUC) of our shape-only method was comparable to other shape-based similarity searching methods. Results show that the magnitude of spherical harmonics decomposition coefficients can be used to describe molecular shape in a partially rotation-invariant way, resulting in a notable enrichment of active compounds in virtual and real screening studies. The combination of pharmacophore filtering by a selforganizing map and shape-filtering by spherical harmonics descriptors might be a useful two-step virtual screening protocol for hit retrieval from large screening compound collections.