Identification of Ligand Binding Sites of Proteins Using the Gaussian Network Model

The nonlocal nature of the protein-ligand binding problem is investigated via the Gaussian Network Model with which the residues lying along interaction pathways in a protein and the residues at the binding site are predicted. The predictions of the binding site residues are verified by using several benchmark systems where the topology of the unbound protein and the bound protein-ligand complex are known. Predictions are made on the unbound protein. Agreement of results with the bound complexes indicates that the information for binding resides in the unbound protein. Cliques that consist of three or more residues that are far apart along the primary structure but are in contact in the folded structure are shown to be important determinants of the binding problem. Comparison with known structures shows that the predictive capability of the method is significant.


Introduction
Ligand binding is generally known as a local process where the binding molecule finds a suitable location on the protein that has the right shape and the favorable energetic interaction [1]. However, observation of both short and long range conformational changes upon binding led to the suggestion that the full topology of the protein should be taking part in the ligand binding process [2]. According to this hypothesis, binding should depend not on the local structure, but rather on an interaction pathway on the protein that takes part in the collective reorganization of the residues to accommodate for the best and most favorable conformation of the protein-ligand complex. Numerous experimental observations are in support of this hypothesis. The changes in conformation in calcium binding proteins is cited in the first comprehensive review of this phenomenon [3]. All experimental evidence points out to the fact that the full topology of the protein should take part in such rearrangements. Thus, the information needed for determining the interaction pathway should somehow be hidden in the topology. In the simplest case, a coarse grained picture of the protein is satisfactory. The topology of the protein in this case is represented by the connectivity matrix, or the contact map, of the three dimensional structure, where the ij'th element of the matrix is unity if the ith and jth residues are in contact, and zero otherwise. Several successful models of proteins exist at this level of the topology, i e the residue based coarse grained topology. One of them is the Gaussian Network Model [4] which uses the connectivity matrix as its force constants matrix. In several recent papers [5,6,7,8], using the GNM, we proposed a statistical thermodynamics argument that leads to the determination of the interaction path of the ligand binding problem. The method, which we term as the 'maximum eigenvalue method' [9] is based on determining the residues that exchange energy with their neighbors and the surrounding medium. In the present paper, we give several examples where we show that these residues which are closely associated with binding are located on paths of spatially contiguous residues. The concept of interaction pathways or networks in relation to ligand binding has been addressed from different perspectives. Lockless and Ranganathan [10] suggested that correlations between two residues resulting in energy transfer among them lead to interaction paths and are evolutionarily conserved. Nelson et al proposed a relation between long range perturbations and the interaction path [11]. Pan et al [12] and Amitai et al introduced the topological closeness measure as a determinant of interaction paths [13]. Our approach is an addition to this series of papers that emphasize the significance of topology in binding. The prediction of binding sites based on GNM is simple and easy to apply as demonstrated in the following examples, using thirty benchmark systems, presented in Table 1 and in the Supplementary data. A new additional concept that we introduce here is the 'clique', defined as a subset of three or more pairs of vertices, with each pair being connected by an edge, i.e. contacting (or interacting with) each other [14]. Cliques are expected to have great significance in protein-protein or proteinligand interactions, as they are stiff regions, therefore likely to be conserved throughout evolution. In our data set, cliques made up of residue triads are identified since triads are frequently observed as spatial forms in the active sites of the proteins. We show the significance of cliques in relation to ligand binding.

Human Heme-Oxygenase-1
The first system that we analyze is an oxireductase, Heme oxygenase (HO) which is responsible for the degradation of heme to biliverdin. In the heme bound state, Human heme-oxygenase-1 (HO-1) arranges its helical shape with the help of highly conserved, distal helix residues, so that it supplies flexibility to accommodate substrate binding and product release [15]. Human HO-1 has a dynamic active-site pocket, which is enlarged in the apo state as distal and proximal helices surrounding the heme plane move farther apart. In the holo form, the active site residues Thr21, Val24, Thr23, Thr26, Ala28 and Glu29, which reside on the proximal helix, and Tyr-134, Thr-135, Leu-138, Gly-139, Ser-142, and Gly-143, which reside on the distal helix, are important as they interact with heme [16,17]. According to the given crystal structure (PDB Code: 1N3U), the binding site for heme in the B chain contains the residues, Lys18, His25, Glu29, Gln38, Tyr134, Thr135, Gly139, Lys179, Phe207, Asn210 and Phe214. Phe207, Asn210 and Phe214 also lie on the proximal side of the active-site pocket. Below, we show that these specific features can be identified by applying the GNM to the apo form of the protein, i.e. 1NI6.pdb. Figure 1a shows the total correlation, C T , of a given residue, presented as the residue index along the abscissa, obtained by using 1NI6.pdb. Figure 1b is the contour plot of the distance fluctuations where the residues that exchange energy with the surroundings are identified with a darker hue. The heavy vertical strip shows that the residues 118-124 interact with all the residues of the protein.
In Figure 2a, the ligand and the residues on the interaction path, i.e. the set of residues with non-zero values of C T , are shown in yellow and green, respectively. Figure 2b is an enlarged version of figure 2a. Residues between 17 and 29 constituting the active site residues exhibit non-zero values of C T . The path that connects the surface to the heme starts with Leu17 and Glu23 at the surface and ends at His25 that neighbors the heme. The path is colored in red and the mentioned residues are labeled in Figure 2b. Residues 53-66 lie on helix H4 that contains the catalytic site Tyr58. The appearance of this region in Figure 1a is mostly due to its stability, resulting from hydrogen-bonded and electrostatic pair interactions with neighboring helix and loop structures such as Tyr58-Asp140, Glu62-Arg86, and Glu66-Tyr78 [18,19]. Similar to the Leu17-His25 path, the residues between Pro109 and Thr135 form a path, one end of which is at the surface of the protein and the other with Tyr134 and Thr135 terminates at the heme.
The path that is lined by residues Pro109-Asp140 is colored in blue in Figure 2b. Finally, the largest peak corresponding to Ala203, which we define as the hub residue, and the second largest peak corresponding to Phe207, seen in Figure 1a, identifies the two residues neighboring the heme. The group of residues between Val199 and Gln212 are represented as the green path, most of which neighbor the heme molecule. All of the residues observed in Figure 1a are obtained from the apo structure, indicating that the information for binding is already present in the unbound structure.
The residues with non-zero total correlation values and that are in contact with the ligand, are presented in Table 2. The interaction path residues that are identified in Figure 2 are also   Table 3. Tabulating all the residues that lie on the pathways would lead to excessive detail. Therefore only residue pairs on the pathway separated by less than 7.2 Å are shown. Because of this, some of the residues cited in the text may not appear in Table 3, which we are presenting to supplement the information given here. The hub residues are identified in Table 3 with yellow highlight. As will be seen from Figure 2 and the following ones, the interaction paths do not consist of a single well defined line of contiguous residues, but rather of several bifurcating paths. Therefore, it is not possible to uniquely identify two extremities to an interaction path. The residues at the multiple extremities of the paths are defined as the gate residues. Cliques of size three are shown in pink in Figure 2b. These are obtained at cut-off 6.2 Å , as 33Phe-214-Phe-218Gln and Gly144-Lys148-Phe167. The first triad is located on the proximal side while the latter lies on the distal side of the Heme molecule referring to the proximal and distal helices that sandwiches the Heme molecule upon binding [15]. Phe214 is a binding site residue while Gly144 is a highly conserved, catalytic residue [20]. Cliques obtained by a cutoff distance of 6.2 Å account for more than 50 percent of the highest conserved cliques of the proteins studied. In Table 4, residues with highest conservation for the six proteins that we present here are shown. These are obtained from the residue conservation data in PDBsum [21]. Among these, the highlighted residues are those belong to cliques of size three obtained by the 6.2 Å cut-off.

Human glutathione S-transferase
Glutathione S-transferases (GSTs) are involved in the catalysis of xenobiotics, carcinogens and conjugations with endogenous ligands. In addition, they can perform a variety of functions in metabolic pathways, which are not related with detoxification, such as the intracellular storage or transport of a variety of other hydrophobic, non-substrate compounds including hormones, metabolites, and drugs. Besides, due to the elevation of GST levels in tumor cells, they have been the focus of significant interest with regard to drug resistance [22,23,24,25,26]. In threedimensional structures, a tyrosine or a serine has been shown to be central in catalysis [22,23,26,27]. In addition, the side chain of Arg15 is thought to be involved in the inner coordination sphere of the sulfur [27]. Figure 3a shows the total correlation, C T of residues. Figure 3b is the corresponding distance fluctuation correlation contour plot. The chain A of unbound crystal structure, 1K3O.pdb was used for calculations. Both chains are identical in sequence and in three dimensional structures, so are the ligands they bind. According to the given ligandbound crystal structure 1K3Y.pdb, the binding site residues for Shexyl-glutathione (GSH), are Tyr9-Arg45-Gln54-Val55-Pro56-Gln67-Thr68-Val111-Met208-Leu213-Phe220-Phe222 (chain A) and Ap101 -Arg131 (chain B). In this structure, GST binds two glycerol molecules, as well.
In Figure 4a, GSH and the residues on the interaction paths are shown in yellow and green, respectively. Figure 4b shows all of identified residues in detail. In Figure 4b, red colored residues line a path starting with a surface residue, Lys64 and ending with the binding site residues Tyr9 and Pro56, which is the hub residue. Tyr9 is conserved among the majority of known GSTs and it is emphasized as an important catalytic residue in literature [22,23,27,28]. The three-dimensional structures have shown that   Table 3. List of residue pairs along the interaction paths and the distances between them. the hydroxyl group of Tyr9 stabilizes the thiolate of GSH through hydrogen bonding [27]. Similarly, residues Leu50-Pro56 (also shown in red) form a shorter path, which has one end at the surface and the other end at the binding site. Three highly conserved residues Gln54, Val55 and Pro56, which also interact with the ligand via hydrogen bonds, play significant roles in the stability and function of the protein [29,30]. Arg15 and Met16 are also interacting with the residues on the Tyr9 path. Arg15 is mentioned as an important active site residue in literature as well [31]. The residues between Gln67 and Tyr74 form another path, which is represented as the green path in Figure 4b, begins at the surface and ends where Gln67 and Thr68 are positioned to participate in hydrogen bonds with the amino group and cglutamyl carboxyl group of glutathione, respectively [32]. It involves five conserved residues Gln67, Thr68, Ala70, Ile71 and Tyr74 [33]. A member of this path, Arg69 makes three hydrogen bonds with the second glycerol molecule. The remaining binding site residues are situated on helix 9, which is known to be highly dynamic. Since, the region is assumed to become structured and localized upon ligand binding [28,34,35], its electron density is unresolved for apo human GST A1-1 [36,37]. Therefore, the binding site residues between Glu210-Phe222 do not appear in Figure 3a.
The residues Ala24 and Val194 display relatively high total correlations. They belong to two different secondary structures and are in contact with each other. Yet, in literature there is no comment on their contribution to the structure and function of the protein. These two residues are not shown in Figure 4b.
Cliques of size three, at cut-off 6.2 Å , are found as Ile92, Ile96 and Ala156 which are located near the interface of chain A and chain B. These three residues are shown in pink in Figure 4b.
Unlike Ile92 and Ile96, Ala156 is a highly-conserved residue [33]. Ile96 is at the glycerol binding site and hydrogen-bonded to the first glycerol. (Figure 4b).
The protein tyrosine phosphatases (PTPs) work complementarily with protein tyrosine kinases in regulating signal transduction pathways which control many physiological processes, such as cell growth or cell differentiation [38,39]. Protein tyrosine phosphatases display a great diversity both in structure and mechanism and they are recognized by the motif HCX5R at their active sites, with an essential cysteine residue (Cys 215 in PTP1B) [40,41].
As seen from Figure 5a, the residues in between His214-Ser222 exhibit the highest total correlation, C T , where His214 is the hub residue. This group of residues is also observed in Figure 5b, the distance fluctuation matrix contour plot, to form a dark strip, implying that they are correlated with rest of the residues. The catalytic domain of PTP1B is composed of a single a/b domain, structured around a highly twisted b-sheet which spans the entire molecule. A-well known catalytic residue Cys215 is located on the loop that stays at the edge of this b-sheet. The His214-Ser222 region, which appear in total correlation plot (Figure 5a), indeed corresponds to the catalytic region of the protein [42]. In PTP1B, the residues His214, Cys215 and Ser 216 have central roles in the activation of the active-site [43]. Cys215 is emphasized as an important catalytic residue in literature [40,41,43]. In the inset of Figure 5a, the small peaks around the residues Arg45, Pro51, Tyr66-Asn68, Leu83-Gln85, Met109, Lys120, Thr154-Arg156, His175 and Gln262 can be seen. In addition to His214-Ser222 region, these mentioned residues draw an interaction path, which is shown in green in Figure 6a, around the ligand, which is shown  in yellow in the same figure. In Figure 6b, the ligand and the interaction path is depicted in more detail and all residues with non-zero total correlation, C T , are labeled. All these residues are mentioned in literature. To start with, Arg45 sits in the loop where phospho-Tyr recognition occurs, with Pro51, a clique residue (shown in pink in Figure 6b). Being located in the binding site of PTP1B, Arg45 is also responsible from the electrostatic attraction of the ligand. Asn68 makes a hydrogen bond with Asn44 and it is located near a highly conserved residue, Arg257. Leu83 packs or surrounds the PTP loop (residues 213-223) where Gln85 makes a hydrogen bond with a highly buried water molecule. Residues Ile82-Pro87 (not shown in Figure 6b) form the core structure that surround the PTP loop. Residues around Met109 form the hydrophobic core structure and they are less conserved compared to the Ile82-Pro87 motif. Lys 120 is another binding site residue, which H-bonds to Ser216 and interacts with Asp181 (not shown in Figure 6b), known as a general acid catalyst among the vertebrate PTPs. Arg156 is conserved more than %80 among all vertebrate PTP domains. His175 is found in the surface exposed WPD loop (residues His 175-Val 184), where a major conformational change takes place upon binding of phosphopeptides to the PTP loop. The PTP loop then, moves several angstroms to close the active site pocket and trap the bound phosphotyrosine [44]. The WPD loop is also not shown in Figure 6b. Gln262 is also actively involved in ligand-binding process [43]. Cliques of size three are found as Pro51-Ser70-Arg257 and Gly86-Cys121-Ser216 at cut-off 6.1 Å , all of which are highly conserved (pink residues in Figure 6b). The first triad is located around the active site; Pro51 is on the phosho-Tyr recognition loop and Arg257 is on the loop Leu250-Leu267 that spans the active site [45]. Arg257 makes a hydrogen bond with the PTP loop and also believed to be involved in stabilization of the nucleophilic nature of the active site cysteine, Cys215 [36]. Cys121, another clique residue is interacting with Cys215, as well [36]. It has been previously reported that Cys121 in PTP1B is a highly nucleophilic group accessible and ready for covalent attachment of 1,2-NQ, which is a known inhibitor of PTP1B. It causes considerable reduction in dephosphorylation activity of PTP1B. Moreover, Cys121 was reported as a non-active site cysteine residue, but it sits on an allestoric site, where it can inhibit the enzyme activity  through specific mechanisms [35,45]. There are a number of PTPs in which Cys121 (90%) is highly conserved [44]. Ser216 lies on the active site and functions in the activation of Cys215 [43].
All of the residues observed in Figure 5 and Figure 6 are obtained from the apo structure, 2HNP.pdb.

Biotin Carboxylase Domain of Acetyl-CoA Carboxylase 2
Acetyl-CoA Carboxylase (ACC) is responsible from the biotindependent synthesis of malonyl-CoA, through its catalytic domains, biotin carboxylase (residues Val259-761Ala) and carboxyltransferase (residues Leu1809-Gly2305). [46] Since it has a crucial role in fatty acid metabolism, ACC has become a target for therapeutic intervention against the treatment of diseases such as type II diabetes, cardiovascular diseases and atherosclerosis, metabolic syndrome in general, and in the control of obesity [47,48,49,50]. Acetyl-CoA Carboxylase 2 (ACC2) in mammals is expressed in the heart and skeletal muscle cells where it regulates the fatty acid oxidation via its malonyl-CoA product [50,51,52,53,54]. Therefore, the inhibitors of ACC2 may be used as novel anti-obesity drugs or therapeutic agents against the metabolic syndrome [50,51,52,53,54,55]. Among currently known small potent inhibitors of mammalian ACCs, only Soraphen A binds to an allosteric site which is about 25 Å distant from the active site of the biotin carboxylase (BC) domain [56,57].Soraphen A interacts extensively with the BC domain where it is in contact with highly conserved residues [50].
In its crystal structure, (PDB code: 3GID), where Sarophen A is bound to the human ACC 2, the binding site residues are given as Lys274-Ser278-Arg277-Glu593-Met594-Asn599-Asn679-Trp681-Phe704-Trp706. Figure 7a shows total correlation, C T as a function of residue index and the residues between Phe704-Ser715, exhibit non-zero values of C T , obtained by using 3GLK.pdb. Figure 7b is the contour plot of the distance fluctuations. Residues that exchange energy with the surroundings  are shown in black. The intense vertical strips indicate that the residues around Gln504, Glu533, Ile649 and Ala713, which is the hub residue, are able to communicate with the rest of the protein. Figure 8a shows the ligand, Soraphen A, in yellow and the interaction path in green. Figure 8b, is a more detailed version of Figure 8a where all identified residues are labeled. As it can be seen from Figure 8b, Glu711, a surface-exposed residue, sits where the green path starts. This green path terminates at Phe704 and Val648, which is also a clique residue (shown in pink in Figure 8b). Phe704, with Ser705 and Trp706, surrounds the ligand. Ile649 and Asn679 are located around the Phe704-Ser715 path. Ile649 appears with the second highest C T value, according to Figure 7a. The blue path (Figure 8b), which starts with Arg710, involves Glu533, Arg519 and ends with Gln504. Glu533 is a wellconserved residue [33]. There are small peaks around the 500 th and 520 th residues, which may correspond to the residues Gln504 and Arg519, lying in blue path. These two residues are known as catalytic residues in ACC2 [33,58].
In this paper, we present no more than the fastest mode results for total coupling of residues. Yet, we checked the results for the second and the third fastest modes and identified new paths of same kind which extend from surface to the ligand binding (active site) pocket. For instance, residues around Ser278 show the highest total correlation values in the fastest third mode. Lys274, Ser278 and Arg277 indeed stabilize the ligand via hydrogen bond formation [33]. Results for the second mode are presented in the inset of Figure 7a. Yet, these residues are not shown in Figure 8b. We will present the contributions from higher modes in detail in our future work.
Cliques of size three, at cut-off 6.1 Å , are found as Ala534-Cys591-Val648 and Val648-Ser705-Ala713. Clique residues which are shown in pink in Figure 8b, reside either in close proximity or within the active site pocket, most of which fall on the interaction paths. All clique residues are highly conserved residues [33].

Human Carbonic Anhydrase II
Carbonic anhydrases are found almost in all organisms, and they are used as catalysts in reversible hydration of carbon dioxides. Zn +2 ions are essential for their catalytic activity which  can bind four or more ligands in carbonic anhydrases. Three coordination sites are occupied by the imidazole rings of the His residues and the forth coordination site is occupied by a water molecule or a hydroxide ion [59]. Carbonic anhydrase II, which is a major element of red blood cells, is one of the most active carbonic anhydrases and has been the most widely studied. It has evolved as a proton shuttle with the primary component His 64 [59]. The catalysis of carbon dioxide hydration by carbonic anhydrase, so the reaction rate, depends heavily on pH. The enzyme is more active in high pH values [59]. In its crystal structure (PDB code: 1A42), human carbonic anhydrase II is complexed with the drug used for glaucoma therapy, the sulfonamide inhibitor brinzolamide. The given binding site residues are His64-Gln92-His94-His96-His119-Val121-Phe131-Val135-Leu198-Thr199-Thr200.
Residues between His96-His107, Tyr114-His119, Phe147-Lys149, Ser217-Val218 and Asn244-Arg246 exhibit non-zero values of total correlation according to Figure 9a. These residues interact with all residues of the protein, referring to the contour plot of the distance fluctuations given in Figure 9b. These two plots are obtained using the unbound structure of the protein. (PDB code: 2CBE).
In Figure 10a, the ligand and the residues on the interaction paths are shown in yellow and green, respectively. Figure 10b depicts all of identified residues in detail. The first path, which is colored in green in Figure 10b, has one end at Ser217-Val218 and Lys149, and the other end at His119. The blue path starts with Ala115 and ends where the two paths are merged by the H-bonds Glu106 and His107 make with Glu117. Through the path Ala115 also interacts with Gly104 via hydrogen bonding. Ser105, which is the hub residue, links Gly104 with Glu106 and His107. The purple path has surface exposed Ser99 at one end and terminates at His96, which interacts with the Zn +2 ion that is directly bound to the ligand (Figure 10b). Indeed, the active site cleft is characterized by this Zn +2 ion which is tetrahedrally coordinated by N atoms of three histidine residues His94(not shown in Figure 10b), His96 and His119 and a water/hydroxide molecule [60]. Ser105 and Glu117 are within the 10 residues that are completely invariant among the whole family of a-CAs and a-CA-related proteins. Ser105 is involved in stabilizing the protein structure, while Glu117 function as an indirect ligand in the active enzyme [61]. Asn244 and Arg246 are two conserved residues, (colored in purple in Figure 10b) which also neighbor the ligand [33].  Cliques of size three are calculated at cut-off 6.1 Å . The residue triads His96-Gly104-Ala116 and Gly63-Lys170-Phe231 appear around the catalytic site of the protein (pink residues in Figure 10b). His96 is an important residue which interacts with the Zn +2 ion during the catalysis. Gly104 and Ala116 are located in a conserved region, which involves Ser105 and Glu117 [61]. Gly63 is next to His64 which acts as a protein shuttle during catalysis [59]. The side chain of Lys170, the closest of all other residues to the pathway for protein transfer with His64 in the outward orientation [62]. It is believed that one function of Lys170 is to maintain an environment of His64 that maximizes protein transfer and catalysis of the hydration of CO 2 and dehydration of bicarbonate, by keeping it in its outward orientation [63]. In the outward conformation, the imidazole ring of His64 heads out of the active site cavity and the hydrophobic residue Phe231 is located near that cavity.
6. S100A6 S100 proteins are small dimeric proteins which belong to the EF-hand family of calcium-binding proteins. They are characterized by a pair of calcium-binding sites each having the helix-loophelix structural motif. Upon calcium binding, the conformation of the protein changes through a hand-type motion, which renders the angle between the helices of EF2 from negative to positive [64].
The expression of S100 proteins is cell and tissue-specific. Most S100 genes are localized within human chromosome 1q21 [65], a region which is susceptible to changes during tumor progression in transformed cells. [66] The expression of the S100A6 gene, is particularly increased in leukemia cells [67] and during the G1 phase of the cell cycle [68], which implies its role in cell cycle progression. Experiments at the protein level also show that S100A6 may be involved in cell growth, cell differentiation and motility [69,70,71,72].
In the crystal structure of human S100A6 (PDB code: 1K9K), binding sites for Ca +2 ions are given as, Ser20-Glu23-Asp25-Thr28-Glu33 and Asp61-Asn63-Asp65-Glu67-Glu72. In Figure 11a, it is observed that residues between Thr28-Lys35, which contains the hub residue Lys31, and Asp61-Glu67 exhibit non-zero values of total correlation, C T . Figure 11b shows the contour plot of the distance fluctuations where the residues that exchange energy with the surroundings, are identified with a darker hue. The heavy vertical strip shows that especially the residues 28-33 interact with the rest protein.
In Figure 12a, the ligand and the residues lining the interaction paths are shown in yellow and green, respectively. Figure 12b is an enlarged view of the ligand and the interaction paths through the protein. In human S100A6, secondary structure elements are arranged into two calcium binding motifs, which compromise Ca +2 binding site I and site II. For site II (S100-hand motif), the most noticeable difference, upon Ca +2 binding is the movement of Glu33. In contrast, the coordination of the Ca +2 in site I (EF-hand motif), is largely mediated by main chain carbonyl of Glu67 and the side chains of Asp61, Asn63, Asp65 and Glu72. [73] As shown in Figure 12b, residues between Thr28-Lys35 form a path beginning with hydrogen bonded residues Lys35 and Lys31, that terminates with two binding site residues Thr28 and Glu33. The path that surrounds site I is shorter and involves Asp61, Asn63 and Glu67, which indeed begins with Leu60, a well-conserved surfaceexposed residue [33].
According to our results, obtained by using the unbound structure, 1K9P.pdb, the residue pairs with the highest total correlation appear around the residues Lys31 and Leu60. Interaction path residues are mostly the binding site residues. Other residues line a network through the protein between the two Ca +2 binding sites. (Figure 12a) Cliques are calculated at cut-off 6.1 Å and shown in pink in Figure 12b. The triad Lys 31-Leu 60-Lys 64 also appears around the catalytic site of the protein.
Results for the remaining twenty four systems are provided in the Supporting Information S1.

Discussion
Based on the GNM, structural and thermodynamic features of the bound state are predicted by using the unbound structures. This shows that the binding information is already present in the unbound structure. This was also observed by us in a recent work [6].
We have presented a collection of computational techniques to study the relationship between the 3-dimensional structure and the dynamics of protein. These two methods relate protein structure with protein function and protein dynamics in terms of ligand binding. Contact map of a protein can be investigated by the tools of graph theory and provides information about the stiff and We conducted our study in a diverse set composed of 30 proteins each having a distinct function. Among those we obtained successful results in 29 systems. Residues with non-zero total correlation (C T ) values appear along a path with one end located at the surface and the other end exposed to the ligand binding pocket (site). These residue interaction networks indicate the existence of the interaction path which is directly related with ligand binding and highly dependent on the topology of the protein. In this paper, we present no more than the fastest mode results for total coupling of residues. Yet, we checked the results for the second fastest mode and identified new pathways of same kind which extend from different energy gate residues (to the ligand binding pocket).
In a limited set of six proteins, presented in Table 4, several cliques made up of residue triads, obtained by a cutoff distance of 6.2 Å , appear as conserved residues. For other proteins, presented in Supporting Information S1 we saw that cutoff distances around but not exactly equal to 6.2 Å were needed for favorable agreement of the predictions with experimental observation. Thus, a clear-cut specification of a clique-cutoff distance is not available, at least within the level of approximation of the present model. However, the shortcoming due to lack of a single cutoff value notwithstanding, we can say from the data we analyzed that several of the catalytic residues which are emphasized in the literature are predicted by the present Gaussian model.
Our approach exhibits a high predictive capability. Table 1 involves the data set and the summary of results for the remaining proteins is presented in Supporting Information S1. We have shown this approach to be successful in the identification of interaction pathways and conserved regions in a diverse set of protein-ligand systems.

Methods
A coarse grained GNM analysis based on C a atoms of residues and a harmonic potential is used. The position of the ith C a is denoted by R i . The C matrix of GNM is defined as i=j and R ij ƒr cutoff 0 i~j and R ij wr cutoff { P k cÃ i~j=k Here, R ij~Rj {R i is the distance between residue i and j, r cutoff is the distance that defines the neighborhood condition generally taken between 6.5-7.5 Å . cÃ is a positive scaling parameter. The correlation of fluctuations follows from the harmonic assumption as where, DR i is the fluctuation vector of the ith C a , DR T j is the transpose of the fluctuation vector of the jth C a , k is the Boltzmann constant and T is the temperature. The correlation matrix may be expressed in modal form as [74] where, l k is the kth eigenvalue of the C matrix, e k is the corresponding eigenvector, and ½ ij is the ijth element of the enclosed matrix. In our recent work [8] we considered only the largest eigenvalue component of the C matrix for a comparative study of various HLA proteins. The mean square fluctuations of the distance between residue i and j is then written as The correlation of residue fluctuations with an energy exchange DU of the protein is Equation 5 now shows the correlations of energy fluctuations with the squared fluctuations of the distance between residues i and j [8]. Summing both sides of Eq. 5 over the jth index leads to the total coupling C T,i of residue i to its surroundings The last term in Eq. 6 acknowledges the role of energy exchange of residue i with its surroundings that consist of the neighboring residues and the surroundings of the protein. Our exploratory calculations showed that there is a small dependence on the cutoff value, usually taken as 7 Å as the radius of the first coordination shell for C a atoms. In the present study, in order to eliminate, or at least minimize, this dependence, we averaged the C T,i values over the interval 6:9ƒr cutoff ƒ7:1 measured in Angstorms. The lower and upper values are selected by trial and error. If r cutoff ƒ6:9, then some relevant interactions are not taken into account. If, on the other hand, r cutoff §7:1, then too many residues all of which do not lie on the same path result that are not of interest to the binding problem are included.
In the largest eigenvalue formalism, the set of residues with nonzero values of C T,i constitute the interaction pathway. As has been shown before [8], and as will also be shown below, these residues are in contact with each other, in general, and constitute a path, the ends of which are exposed to the surroundings of the protein, which we termed as energy gates. Along this path lies a residue that is highly interactive with a large number of residues of the protein, and hence is referred to as the hub.
By its structural nature, a clique constitutes a stiff region of the protein. Considering the contact matrix A of the protein, cliques of size three are obtained according to the following recipe where i, j and k are residue indices, c is the residue distance (number of residues) between contacting residues, and n is number of residues for each protein.
We studied several different systems, six of which we selected in the present study. These are given in Table 1. The last two columns of Table 1 give the pdb codes of the ligand free and ligand bound structures. In all our calculations, we perform the predictions on the ligand-free structure and compared the results using the ligand-bound structure.
The cutoff distances for the C matrix were chosen as follows: Using 4810 non-redundant PDB structures obtained from Reference [75], we counted the frequency of observation of residue contacts for different values of R cutoff , which was varied in the interval 5-15 Å . The results are shown in the first figure of Supporting Information S2, where the filled circles are the results of calculations. The straight line drawn to the linear part of the curve therein shows the scaling region. In this region, changing the R cutoff value by a factor changes the number of observations proportionately, and this relates simply to the size effect. Below the scaling region, effects other than size effects are accounted for as R cutoff is increased. An R cutoff at the boundary of the non-scaling and scaling regions reflects all the effects that are of interest. The arrow in the first figure of Supporting Information S2 corresponds to an R cutoff value of 7.2 Å . In order to include effects that would come from smaller R cutoff values, we took five equally spaced stations between 6.9-7.2 Å , and averaged the reported total correlation values over these five stations.
The cutoff distances for the cliques were chosen with a similar analysis described in the preceding paragraph for the contacting residue pair's analysis. The results are shown in the second figure of Supporting Information S2, where the filled circles are the results of calculations. The straight line drawn to the linear part of the curve shows the scaling region. An R cutoff has to be chosen below the scaling region. The arrow in the figure corresponds to an R cutoff value of 6.2 Å . In the calculations, we tried R cutoff values of 6.0, 6.1, 6.2, 6.3 and 6.4 Å , and accepted the value of R cutoff that led to the most consistent comparison of the model with observations. The binding site residues using the bound complexes are defined as follows: In the complex, if the distance between an atom of a residue and an atom of the ligand were less than 3.5 Å , and if this residue had a non-zero total correlation calculated by using the unbound PDB file, then the residue was defined as a contacting residue. The list of contacting residues for the six systems analyzed in this study is given in Table 2.

Supporting Information
Supporting Information S1 Total correlation C T of residues as a function of residue indices and the corresponding three dimensional structures showing the nteraction paths and the cliques for the 24 benchmark proteins.

Author Contributions
Conceived and designed the experiments: CT BE. Analyzed the data: CT BE. Contributed reagents/materials/analysis tools: CT BE. Wrote the paper: CT BE.