Normal Modes Expose Active Sites in Enzymes

Accurate prediction of active sites is an important tool in bioinformatics. Here we present an improved structure based technique to expose active sites that is based on large changes of solvent accessibility accompanying normal mode dynamics. The technique which detects EXPOsure of active SITes through normal modEs is named EXPOSITE. The technique is trained using a small 133 enzyme dataset and tested using a large 845 enzyme dataset, both with known active site residues. EXPOSITE is also tested in a benchmark protein ligand dataset (PLD) comprising 48 proteins with and without bound ligands. EXPOSITE is shown to successfully locate the active site in most instances, and is found to be more accurate than other structure-based techniques. Interestingly, in several instances, the active site does not correspond to the largest pocket. EXPOSITE is advantageous due to its high precision and paves the way for structure based prediction of active site in enzymes.


Introduction
Prediction of functional sites in proteins is essential for a range of bioinformatics applications such as molecular docking, and structure based drug design. Traditional methods for predicting functional sites include three approaches: 1). The first approach uses sequence homology to find evolutionary conserved residues with functional activity. 2). The second approach utilizes structural homology with other proteins of known function to locate functional regions. 3). The third and last approach uses geometry and physico-chemical attributes of the protein structure and sequence to identify areas with functional activity.
Over the years, several techniques based on the third approach have been developed. These techniques include LIGSITE [1], POCKET [2], POCKET-FINDER [3], SURFNET [4], CAST PLOS Computational Biology | DOI: 10.1371/journal.pcbi.1005293 December 21, 2016 1 / 17 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 [5], PASS [6], Cavity Search [7], VOIDOO [8], APROPOS [9], LigandFit [10], 3DLigandSite [11], MSPocket [12], Fpocket [13], McVol [14], Ghecom [15], PocketDepth [16], PocketPicker [17], VICE [18], as well as consensus techniques which use a combination thereof such as MetaPocket [19]. Other methods analyze the protein surface for pockets [20,21], cavities [22][23][24], and channels [25] using pure geometric characteristics, and do not require any prior knowledge of the ligand or of sequence homology. Other computational techniques use geometric characteristics in combination with physico-chemical traits. Such methods include FOD [26], and Elcock [27] that analyze the hydrophobicity distribution under the assertion that functionally important residues are often in electrostatically unfavorable positions. Similarly, THEMATICS [28] uses geometric characteristics in combination with theoretical microscopic titration analysis, while the methods of Goodford [29] and Rupert et al. [30], and SiteHound [31] identify ligand binding sites based on analyses of the binding energies of probes placed on a grid around the protein. Another purely geometric method, EnSite, uses the proximity of catalytic residues to the molecular centroid to accurately detect the active sites of enzymes with high accuracy [32]. When used in combination with sequence and structure homology, geometric techniques are enhanced and prediction is improved. Some techniques use a vast combination of parameters ranging from conservation, residue type, accessibility, 2D structure propensity, cleft depth, B-factors, etc. to predict active site residues. Using such parameters, Gutteridge et al. predicted the location of active sites in enzymes using a neural network and spatial clustering [33]. Similarly Petrova et al. used Support Vector Machine with selected protein sequence and structural properties to predict catalytic residues [34]. In both cases, about 90% of the actual catalytic residues were correctly predicted. From these data it is clear, that one should rely on sequence and structure homology when possible, and over the past decade, multiple methods to detect binding sites and functional pockets based on geometric, structural, and genetic data were developed [35][36][37][38][39]. Several webservers of ligand binding sites have also been constructed and may be used to infer unknown ligand binding sites based on homology and other attributes such as Pocketome [40], FunFold [41], scPDB [42], IBIS [43], Multibind [44], fPop [45], and FINDSITE [46]. To date however, no comprehensive study comparing geometry based techniques has been performed.
Normal-mode analysis is one of the standard techniques for studying long time dynamics and, in particular, low-frequency motions. In contrast to molecular dynamics, normal-mode analysis provides a very detailed description of the dynamics around a local energy minimum. Even with its limitations, such as the neglect of the solvent effect, the use of harmonic approximation of the potential energy function, and the lack of information about energy barriers and crossing events, normal modes have provided much useful insight into protein dynamics. Over the past years, several techniques have been described to calculate large-scale motions using simplified normal-mode analysis [47][48][49][50][51]. Based on these techniques, several executable programs to calculate normal modes have been released, such as ElNemo [52], GROMACS [53], and STAND [49].
Recently, several studies have drawn attention to the allosteric effect of ligand binding on normal modes dynamics [54]. From these studies, a clear correlation between binding in the native site and perturbation of normal modes was identified. The same allosteric effect of ligand binding on molecular dynamics was also pointed out by Bhinge [55] and Ming [56] which proceeded to use molecular dynamics simulations in predicting ligand binding sites. It is based on these recent advances, that we became aware of the capacity of normal modes in predicting active sites.
In this paper we present a novel structure based technique using normal modes to predict the location of active sites in enzymes. The technique exploits the normal mode opening and closing motion of enzymes and the accompanied change of solvent accessibility and highlights residues of the active site. The idea behind the presented technique is that active sites pockets become exposed in normal mode dynamics (Fig 1).
The hypothesis that active sites are surrounded by a shell of flexibility is not new and has been proposed in the dynamic lock-and-key model for biomolecular interactions. The shell of flexibility allows the enzyme to adapt to its ligand through an induced fit. The hypothesis was demonstrated in several studies notably by Weng et al. in a recent study on the flexibility of enzyme active sites [57], and less recently by Babor et al [58].
The technique which detects EXPOsure of active SITes through normal modEs is named EXPOSITE. The technique may also be used in association with other methods to rank geometrically calculated pockets according to their solvent exposure. First, the prediction strength of EXPOSITE is trained extensively in a dataset containing 133 enzymes with known active sites from the Catalytic Site Atlas (CSA) database [59]. Then, EXPOSITE is tested in a dataset containing 845 enzymes and found to be more robust than other structure-based techniques. EXPOSITE's high success rate is valuable for structure-based identification of active sites and clearly shows the added value of using normal modes for finding active sites. The technique does not attempt to withdraw from the importance of using genetic data, and clearly, a combination of both structural and genetic data would be more useful for predicting active sites than any of them on their own.

Dataset assembly
To assemble a training dataset containing 133 enzymes with known active sites, enzymes were selected from the CSA database [59], version 2.2.1. The dataset enzymes were selected according to the following two criteria: 1). The enzyme active site is known from the literature (LIT), and not derived by homology. 2). The biologically active enzyme is composed of a single polypeptide chain and a single oligomerization state.
To assemble a test dataset containing 845 enzymes, enzymes were selected from the CSA database [59], version 2.2.1. The test dataset was compiled by extracting chain A of all LIT entries that were not included in the 133 training dataset. These two datasets were used for training and testing EXPOSITEs prediction consistency respectively.

Normal mode calculations
To calculate normal modes of the dataset enzymes, two programs were utilized namely STAND [49] and ElNemo [52] and were run locally. For STAND, both real normal modes (REA) and Tirion modes (TIR) were calculated. For speed, the STAND option of coarse graining, 1 point (1 pt), which accelerates the calculations yet does not flaw the results was used, and defaults values of deformation amplitude were used. For ElNemo, default values of DQMIN -100 and DQMAX 100 were utilized. The DQMIN and DQMAX parameters correspond to the deformation amplitude in the direction of a single normal mode. For both STAND and ElNemo, deformation amplitudes were not scaled, and the same amplitude produces smaller deformation for larger molecules. For both STAND and ElNemo, only the 10 non-trivial lowest frequency modes were calculated. For each of these 10 modes, 40 PDB files were generated by STAND and 10 PDB files were generated by ElNemo all distorted along the particular mode. The two methods are very different in that STAND (REA) minimizes the structure and then calculates modes in φ and ψ torsion angle space whereas STAND (TIR) and ElNemo avoid minimization by using Tirion modes [50] and then calculate modes in Cartesian coordinate space. For STAND, the opposite extremes of the harmonic motion were empirically chosen as the 1 st and 14 th structure out of 40 respectively. At these extremes, the structures look fully "distorted" from each other. For ElNemo, the opposite extremes of the harmonic motion are the 1 st and 10 th structure out of 10.

Solvent accessible surface calculations
To calculate the solvent accessible surface (SAS) area of amino acids in the generated PDB files, the DSSP program was used [60]. For each mode, SAS for each residue in the two structures at opposite extremes of the harmonic motion were calculated, and the absolute change of SAS between the extreme mode distortions, |ΔSAS| was used.

Pocket calculation
To calculate pockets, LIGSITE [61] was run locally using default parameters. In each case, the 10 largest pockets were calculated and the pocket center as well as the pocket size were collected.

Prediction of active site
The predicted active site was defined as the geometrical center (centroid) of the Cα coordinates of all residues with a solvent exposure |ΔSAS|, in the range 20-40Å 2 The observed active site was defined as the geometrical center (centroid) of the Cα coordinates of the active site residues specified in the CSA database [59].
The predicted and observed active sites were represented each by a single coordinate in Cartesian space. The distance between these two coordinates was defined as the distance between the predicted and observed sites.
The success of a prediction was based on the distance between the predicted and observed sites in the training and test datasets. If the distance between the predicted and observed sites was less than 12Å, then a prediction was considered successful. Conversely, if the distance was larger than 12Å, then a prediction was deemed incorrect.
In the special case of the PLD dataset and for easy comparison with other techniques, a prediction was considered successful if any atom coordinate of the ligand was within 4Å of the predicted site. If no atom coordinate of the ligand was within 4Å of the predicted site, then the prediction was considered wrong.

Comparison with other techniques
To compare EXPOSITE with other techniques, several software were run on all datasets namely, the training dataset of 133 enzymes, and the testing dataset of 845 enzymes, as well as a dataset containing 48 proteins derived from the PLD dataset [62] and engineered by Huang et al [61]. First, each of the following software was downloaded: LIGSITE, CAST, PASS, and SURFNET. For EnSite, no software was available, and the script was reconstructed based on the algorithm described in the original paper [32]. Then, each of the software was run locally on a PC running under Windows or Linux. In the case of the training and test datasets (which lacks ligands), a prediction was considered successful if the predicted and observed active site were less than 12Å apart. In the case of the PLD dataset (which contains ligands), a prediction was considered successful if the predicted active site was less than 4Å apart from any ligand atom.

Dataset assembly
To reliably assess the success rate of our technique in an sizeable ensemble, two datasets were assembled from the CSA database [59]. The CSA database contains 23,265 enzymes with known active sites. Of these, only 845 had an active site known from the literature (LIT), and comprised the test dataset. Of these, only 133 were composed of a single chain that is biologically active as a monomer in a single oligomerization state, and comprised the training dataset. The PDB IDs of the 133 selected enzymes of the training dataset are listed in S1 Table. The PDB IDs of the 845 enzymes of the test dataset are listed in S2 Table. To test for homology within the datasets, the enzyme commission (EC) numbers were retrieved. Although, some homologues were found within a single dataset, no homologues were found between the training and test dataset.

Calculation of pockets and solvent accessible surface
A number of programs were tested to calculate geometric pockets of biomolecular structures, i.e. POCKET [2], LIGSITE [1], POCKET-FINDER [3], SURFNET [4], CAST [5], PASS [6]. The program LIGSITE CSC [61] provides a list of pocket centers and size in a PDB format and was subsequently utilized in all our calculations.
Surprisingly, there are significant differences between SAS of residues calculated by DSSP and other techniques such as ENCAD, CNS, and Accelrys. These differences arise from the different approaches used in calculating SAS. Nonetheless, when calculating the change of surface areas, ΔSAS, these differences cancel out and all programs produce comparable ΔSAS values.

EXPOSITE training
Biologically relevant modes are not always represented in the lowest frequency modes. Sampling more data, i.e. by calculating more modes could provide better results. Similarly, changing the |ΔSAS| thresholds could also lead to a higher success rate by allowing more exposure data to be included. To test this assertion and optimize the success rate of EXPOSITE the following parameters were varied: the threshold value of |ΔSAS| and the number of normal modes sampled. The number of modes sampled was varied from 0 to 10 and the |ΔSAS| minimum and maximum thresholds were changed from 0 to 60 Å 2 .
As seen in S4 Table, the optimal |ΔSAS| thresholds for ElNemo were around 20 and 40 Å 2 respectively. Below the threshold of 10 Å 2 , normal exposure fluctuations contribute little to EXPOSITE's accuracy. Above the threshold of 40 Å 2 , exposure changes arise from the normal mode tip effect (bond breaking and exaggerated exposure) and contribute little to the EXPO-SITE accuracy. For STAND, the optimal |ΔSAS| threshold values were 20 and 40 Å 2 respectively. This difference of |ΔSAS| thresholds between STAND and ElNemo is due to the fact that STAND uses coarse graining, inherently reducing the surface area, whereas ElNemo does not. STAND uses coarse graining and represents each amino acid with a single bead, while ElNemo uses a heavy atoms representation. In both cases, the maximum deformation amplitude were not chosen and default values were used. Also, the maximum deformation amplitude was not scaled in this study.
The optimal number of mode sampling peaks to a plateau around modes 8, 9, and 10 for both STAND and ElNemo (S5 Table). Below this sampling number important information is lost. Intriguingly, when using no threshold for |ΔSAS|, the accuracy of EXPOSITE is consistently 86%, no matter how many modes are sampled.
Correlation of predicted and observed active site in 133 enzyme training dataset EXPOSITE uses solvent accessibility changes in normal-modes to predict the location of active sites in enzymes. As seen in Fig 2, residues experiencing large accessibility changes (colored cyan and green) are likely to be found in proximity to active site residue (shown in text). In contrast, residues experiencing little exposure change (colored blue) are less likely to be found in vicinity of the active site. The proximity between residues experiencing large |ΔSAS| and the experimentally observed active site residues is an indicator of the precision of EXPOSITEs prediction.
On average, the predicted and observed active sites in the training dataset are separated by 7.9 Å, and a standard deviation of 4.4 Å (S1 Fig). The maximum success rate of EXPOSITE in the training dataset consisting of 133 enzymes was 92%. Curiously, in the training dataset, the binding pocket coincides mostly with the largest pocket (82%) but not always (18%). This finding accounts for the pitfall of other techniques which rely on pocket size only for ranking.
Also interesting is the fact that no active site was found in pockets with a size less than 7 Å 3 . Such pockets are too small to accommodate ligands and validate our convention of discarding them as insignificant.   separated by an average of 9.2 Å, 11.5 Å, and 14.1 Å for EXPOSITE, ENSITE, and LIGSITE respectively (Fig 3). Significantly, if a successful prediction is arbitrarily defined by a distance cutoff of 4 Å, then the number of hits of EXPOSITE (16.1%) is almost double that of ENSITE (8.7%). Similarly, if a successful prediction is arbitrarily defined by a distance cutoff of 3 Å, then the number of hits of EXPOSITE (10.4%) is 2.4 times that of ENSITE (4.3%).

Correlation of predicted and observed active site in test dataset
To test the robustness of EXPOSITE, we tested its success rate in a dataset containing 845 enzymes (S2 Table). Not surprisingly, the success rate is much lower than in the 133 enzyme dataset. Reliably however, EXPOSITE is better that EnSite in predicting the active site by >2%. The sharp decrease of prediction success rate in the 845 enzymes dataset is not surprising, as the dataset does not discriminate between real homomonomeric enzymes with high success rates, and homomultimeric enzyme assemblies with low success rates (close to 0). Even if statistically robust, the large 845 enzyme dataset does not reflect the real success-rate of prediction techniques, and the smaller 133 enzyme dataset should be regarded as a more representative alternative. The large 845 enzymes dataset is too diverse, and demonstrates the difficulty in assembling representative datasets.  prediction. Residues experiencing large accessibility changes (colored green) are likely to be found in proximity to the ligand (colored red), whereas residues experiencing little exposure change (colored blue) are further away. The proximity of residues with large accessibility changes and residues of the observed active site is a success indicator of EXPOSITEs predictions.

Correlation of predicted and observed active site in PLD dataset
On average, the predicted and observed centers in the protein PLD dataset are separated by 7 Å with a standard deviation of 3.3 Å. Intriguingly, the separation in the PLD dataset is smaller than that of the CSA dataset by almost 1 Å, and it is probably a flaw due to the handpicked nature of the PLD dataset.

Comparison to other techniques
To accurately and robustly compare EXPOSITE with other techniques, all other software were run on all datasets namely the training dataset of 133 enzymes, the testing dataset of 845 enzymes. A prediction was considered accurate if the distance between the predicted and observed sites was less than 12Å. If the distance was larger than 12Å, then a prediction was considered inaccurate. The calculated prediction accuracies are listed in Table 1.
When compared to other geometric techniques EXPOSITE is advantageous due to its high success rate. As seen in Table 1, EXPOSITE is only slightly better than EnSite at predicting active sites and EnSite is still superior to EXPOSITE in speed as it is ingenious in simplicity. Also note that prediction of binding sites in unbound proteins is less successful than that of ligand-bound proteins simply because the ligands occupy and expose the binding site through induced fit thereby easing its identification.
To accurately and robustly compare EXPOSITE with other techniques, all other software were run on the bound and unbound PLD dataset [61]. A prediction was considered accurate if any ligand atom was within 4Å of the predicted site. If no ligand atom was within 4Å of the predicted site, then the prediction was considered inaccurate. The calculated prediction accuracies are listed in Table 2. The data for EXPOSITE and Ensite is reported by us, the data for VICE was reported by Tripathi et al [18], the data for Fpocket was reported by Le Guilloux et al. [13], the data for PocketPicker was reported by Weisel et al. [17], the data for LIGSITEcs, CAST, PASS and SURFNET were first reported by Huang et al. [63]. Please note that EXPO-SITE is not always successful, such as in the case of PDB 1igj, 3gch, 3mth, and 2tmn as may be seen in Fig 5. Intriguingly, the classically accepted metric for binding site prediction is 4Å, and we used this metric in the classical PLD dataset when comparing the classical performance of EXPO-SITE, Ensite, VICE, Fpocket, PocketPicker, LIGSITEcs, CAST, PASS and SURFNET (Table 2). However, in the unclassical training and test datasets which were never tested before, we relied on an unclassical distance of 12Å. The training and test datasets contain 20 times more proteins than the hand-picked PLD dataset, and if the classical distance of 4Å was used, then the performance of all techniques sank drastically. To maintain good performances for all techniques in the training and test datasets, the classically accepted metric for binding site prediction was raised to an unclassical 12Å. Generally speaking, the success rate in the handpicked PLD dataset is higher than in the non-handpicked 845 test dataset. This discrepancy suggests that the PLD dataset was not randomly picked, and could artificially increase prediction success rates.

EXPOSITE ranks active site pockets
EXPOSITE's feature, of highlighting active sites is very useful for ranking pockets. Indeed, the technique is capable of ranking enzyme pockets according to their degree of exposure in normal mode dynamics. This ranking enables EXPOSITE to choose the correct binding pockets from a list of pockets calculated by LIGSITE. The assumption that the active site pockets is usually in the largest pocket [1,4,64] is being used by several pocket detection programs and the top site is generally the largest one. However, this assumption is not always true and in several instances, the active site corresponds to the second, third, or fourth largest pocket.  The predicted and observed binding sites are indicated by green stars and red ligands respectively. In orange, cyan, and green, are residues displaying large changes of accessibility in normal modes, and in blue, are residues which display little or no change of exposure. Note that EXPOSITE failed to predict the binding site in these cases due to multiple backbone breaks resulting in unusual modes (i.e. 3mth, 3gch), and to odd shaped protein structure (i.e. 1igj). The figure was prepared using Pymol. doi:10.1371/journal.pcbi.1005293.g005

EXPOSITE rationale
The rationale behind the success rate of EXPOSITE is fairly simple. For proper enzyme activity, protection from the surrounding water is often necessary as shown by normal modes closure of the active site. Proteins in general and enzymes in particular often act as environment protectors. They envelop substrates to catalyze chemical reaction that would otherwise not take place in aqueous solution. They conceal prosthetic groups to coordinate binding thus increasing affinity which is negligible in water. They act as small shielding cases displaying alternating motions of opening and closing to allow ligand entrance and protection respectively. Throughout this motion, protein residues located at various distances from the active site are exposed to the solvent to a different degree. Residues in proximity to the active site are exposed more than those faraway. This idea lays down the foundation for EXPOSITE suggesting the pocket closest to the maximum exposure center is the active site. The change in solvent accessibility between the X-ray structure and the largest deformation of either of the normal mode extremes could also have been used. However, the maximum effect of motion is observed between the two extremes which vibrate around the X-ray structure corresponding to a local minimum.

EXPOSITE parameters
EXPOSITE takes into account several parameters such as accessibility change in normal modes, centroid distance from pockets, as well as pocket size. Normal modes by their own virtue take into account more parameters such as contact network and distances. Together, these parameters resemble those used in neural network techniques [33,34] where they are analogous to accessibility, cleft depth, B-factors, etc. . . As much as these techniques seem different, the analogy between the parameters is astounding.

Coarse graining does not decreases EXPOSITE success rate
The success rate is not affected by the different types of normal modes techniques, STAND and ElNemo. The success rate remains unchanged even when STAND and ElNemo are used in different combinations with accessibility calculators (i.e. ElNemo with ENCAD accessibility calculator [65]. The success rate does not originate from the difference in the atomic representation used by ElNemo and STAND. In fact, when running STAND in full-atom representation the success rate remains unchanged. These data indicate that coarse graining which ignores the amino acid type and accessible surface does not influence the success rate of EXPOSITE. In fact, adding heavy atoms to the PDB files generated by STAND also does not decrease the success rate of EXPOSITE. We conclude that coarse-graining and accessibility calculation methods do not affect the success rate of EXPOSITE.

Caveats of EXPOSITE
Care should be taken when using our technique on structures composed of several domains. Practical interpretation of normal modes of multi-domain structures tend to be problematic in the sense that bending and twisting of one domain relative to another tend to overshadow modes with biological meaning. One way to circumvent this problem is to run normal modes of single domains to predict its active site. We excluded multi chain enzymes which are biologically active in oligomeric states from our CSA dataset. Similarly, care should be taken when using EXPSOITE on structures with elongated termini or exceedingly flexible loops. Such structures often present odd normal modes around these areas which tend to overshadow modes with biological meaning. Some strongly recommended ways to circumvent the problem of exaggerated motion is simply to clip out (or edit out) the stretches and rerun normal mode computation or to set an upper value for the cutoff of |ΔSAS| of 75 Å 2 when calculating modes with ElNemo (40 Å 2 for STAND). The cutoff should minimize the effect of loose and flexible termini with exaggerated exposure change. A complete list of success and failures is provided in S6 and S7 Tables.

Binding site vs. active site
A distinction should be made between the concepts "binding site" and "active site". Usually, an active site is found in a single copy in an enzyme, while binding sites may be present in multiple copies in proteins. Thus, prediction of active sites and ligand binding sites are very different, and whereas only one prediction is correct for enzymes, several predictions are correct for proteins. To complicate things further, some enzymes are composed of multiple chains, each equipped with a distinct active site, and so much care should be taken so as not to over interpret a prediction. As a rule of thumb structure based predictions (EXPOSITE, EnSite, etc) are more accurate in single chain polypeptide enzymes.
Absence of correlation between pocket size, substrate size, number of residues with high accessibility change, and number of active site residues In an attempt to correlate between pocket size and active site, the following parameters of active site were calculated in the PLD dataset: 1). The number of Cα atoms of the active site was derived from the CSA database. 2). The number of heavy atoms in the substrate was calculated from the PLD database. 3). The number of residues of with high accessibility change was calculated from EXPOSITE. 4). The size of the predicted pocket in Å 3 was from LIGSITE. These parameters all reflect on the size of the active site yet there is no obvious correlation among them. There was no correlation (R = 0.12) between pocket size and the number of active site residues. This is partially due to fractionation of active sites into adjacent pocket (POK) which decrease "real" active site size. This fractioning of active sites is a problem often encountered in pocket calculating programs. Adjoining sizes of vicinal pockets did not improve the correlation significantly.

Conclusion
Over the past years normal modes have enjoyed a revival. In this article, the biological relevance of normal modes is illustrated in a new technique. The presented technique exposes active sites of enzymes with high success rates. As pocket detection methodologies normal mode techniques improve so will our technique. In the future, EXPSOITE is expected to become publicly available as a basic tool (website and/or program) for predicting active sites of enzymes. The Perl code used in this study is freely available in the supplementary data. Note that DSSP, LIGSITE, ElNemo, and/or STAND must be obtained from third parties, and that the time bottleneck of the method is normal mode calculation. Shown are four additional EXPOSITE predictions for the enzymes (A) 2pk4, (B) 1ulb, (C) 1stp, and (D) 1apu of the PLD dataset. The predicted and observed binding sites are indicated by green stars and red ligands respectively, and LIGSITE pockets are displayed as white spheres. In cyan and green, are residues displaying large changes of accessibility in normal modes, and in blue, are residues which display little or no change of exposure. Note that the ligand (in red) is within 4Å of the predicted site (green star). The figure was prepared using Pymol. (TIF) S1