Deciphering the Arginine-Binding Preferences at the Substrate-Binding Groove of Ser/Thr Kinases by Computational Surface Mapping

Protein kinases are key signaling enzymes that catalyze the transfer of γ-phosphate from an ATP molecule to a phospho-accepting residue in the substrate. Unraveling the molecular features that govern the preference of kinases for particular residues flanking the phosphoacceptor is important for understanding kinase specificities toward their substrates and for designing substrate-like peptidic inhibitors. We applied ANCHORSmap, a new fragment-based computational approach for mapping amino acid side chains on protein surfaces, to predict and characterize the preference of kinases toward Arginine binding. We focus on positions P−2 and P−5, commonly occupied by Arginine (Arg) in substrates of basophilic Ser/Thr kinases. The method accurately identified all the P−2/P−5 Arg binding sites previously determined by X-ray crystallography and produced Arg preferences that corresponded to those experimentally found by peptide arrays. The predicted Arg-binding positions and their associated pockets were analyzed in terms of shape, physicochemical properties, amino acid composition, and in-silico mutagenesis, providing structural rationalization for previously unexplained trends in kinase preferences toward Arg moieties. This methodology sheds light on several kinases that were described in the literature as having non-trivial preferences for Arg, and provides some surprising departures from the prevailing views regarding residues that determine kinase specificity toward Arg. In particular, we found that the preference for a P−5 Arg is not necessarily governed by the 170/230 acidic pair, as was previously assumed, but by several different pairs of acidic residues, selected from positions 133, 169, and 230 (PKA numbering). The acidic residue at position 230 serves as a pivotal element in recognizing Arg from both the P−2 and P−5 positions.


Introduction
Protein phosphorylation is one of the most abundant posttranslational modifications. It is catalyzed by protein kinases, a large group of enzymes that account for approximately 2% of the human genome [1]. Phosphorylation involves the regulation of almost every process in the cell, and numerous diseases, such as diabetes, Alzheimer's disease and cancer, are tightly related to abnormal levels of protein phosphorylation. Thus kinases are considered one of the major drug targets of the 21 st century [2], with over a hundred kinase inhibitors in various stages of clinical trials and several drugs already in the clinic [3].
Most kinase inhibitors target the ATP-binding site [4], providing different, but usually low levels of kinase selectivity [5]. In pursuit of additional (non-ATP site) ways of inhibiting kinases, which in some cases may provide kinase-selective inhibition [6], kinase-substrate and other kinase-protein interactions are being actively targeted by various research groups using small molecules [7,8] and peptidomimetics [6,9,10,11,12]. Structural information and computational approaches have greatly contributed to the design of low-molecular-weight kinase-targeting drugs [13]. The need for computational tools for peptide design is on the rise, due to increasing interest in protein-protein interactions and their inhibition in general [11,14,15,16] and for protein kinases in particular [6,9,17], providing part of the motivation for the current work. While peptides are usually considered poor drug candidates because of low cell permeability and high tendency to be rapidly metabolized, recent improvements in synthetic peptide chemistry [18], successful usage of modulations that enable cell-penetration of proteins and peptides [6,12,19,20,21,22] and of different administration routes, open up new avenues in the field of peptidic and peptidomimetic drug discovery [23].
Members of the protein kinase family share a common structure, consisting of a small N-terminal lobe and a larger Cterminal lobe [24]. The ATP-binding site and the main substraterecognition site lie within the major groove formed between the two lobes. In eukaryotes, most kinases transfer the ATP c-phosphate to either serine or threonine residues (Ser/Thr kinases), while others phosphorylate tyrosine residues (Tyr kinases) [25]. The Ser/Thr kinases can be further classified into various families and subfamilies based on sequence similarity, such as ACG, CAMK, etc. [1]. Another common classification of Ser/Thr kinases is into three main groups, basophilic, acidophilic and proline-directed. This classification is based on basic, acidic or proline substrate residues that govern kinase-substrate recognition [26,27], and is assumed to confer global specificity between the three groups of kinases [28].
The above-mentioned residues are part of a set of amino acid residues immediately flanking the substrate phosphorylation site (which is referred to as P0) that play an important role in the tendency of the substrate to be recognized and phosphorylated by a particular kinase. The term substrate consensus sequence (SCS) refers to the essential sequence elements surrounding the phosphorylated site [29]. The flanking residues are referred to as P2n or P+n according to their location along the substrate sequence, n residues N-terminal or C-terminal to the P0 position, respectively.
Early studies of the prototypical basophilic protein kinase A (PKA) showed a pronounced preference for Arginine (Arg) at positions P22 and P23 of the substrate [30]. Later, the strong preference for Arg at P23 was shown to be a general feature of many basophilic kinases. In a recent work that tested the consensus phosphorylation motifs of 61 out of the 122 kinases in Saccharomyces cerevisiae, 57% were detected as basophilic, with 87% of the basophilic kinases showing a primary preference for Arg at the P23 position [31]. In fact, peptide phosphorylation screening approaches often fix Arg at this position, and concentrate on exploring preferences at other positions [32].
Aside from the P23 position, P22 and P25 are the only two positions for which the frequency of Arg is greater than its average occurrence in the human proteome [33] and these are the focus of the current study. While position Glu127 (PKA catalytic domain numbering used throughout the paper) of the kinase, located at the hinge that connects the two lobes, has been shown to be the source for Arg specificity at position P23 [31,34,35,36,37], the identities and roles of kinase residues defining Arg specificities at P22 and P25 are more intricate. Mutational analysis [33,38], as well as the crystal structures of several kinase-peptide complexes, confirm that different basophilic kinases use the same surface site to accommodate Arg at either the P22 or P25 substrate positions [39,40,41]. Positions 129, 133, 169 and a dominant acidic pair (170/230) have been implicated, but are not always sufficient for explaining the experimental P22/P25 Arg preferences [33,40,42,43]. It appears that the P22/P25 Arg specificity and the interaction strength is not conferred by a readily observable sequence or structural feature, but rather by a combination of a few subtle attributes which need to be uncovered by particularly sensitive methods.
Prediction of kinase-specific phosphorylation sites is commonly based on sequence-based computational methods [44], but structure-based approaches have begun to emerge as well [45,46,47]. Notably, the PREDIKIN method combines structural information on specificity-determining residues with sequence information obtained from known kinase substrates [46,47]. The currently available methods are trained on kinase and substrate sequences and rely on analogy to known complex structures. Wellsuited for the detection of conserved specificity determinants between kinase subfamilies, these methods are less sensitive when specificity is dictated by kinase-unique features, and they are not aimed at supplying information on amino acid binding preferences outside the known spatial organization of the substrate/peptide complexes. Yet, such information is valuable for de novo design of protein-protein interactions inhibitors.
Computational mapping methodologies have the potential of addressing the challenge of kinase-unique binding and to specificity analyses further away from the phosphoacceptor binding region. These approaches identify the favorable binding position of a molecular probe using solely the molecular interaction field embedded in the three-dimensional structure of the protein.
Consequently, a sensitive energetic description of independent functional moieties within the investigated binding environment is supplied. A variety of computational mapping methodologies have been developed, including grid based methods [48] combined with fft correlation techniques [49] and methods that employ simultaneous minimization of all probes [50]. Computational surfacemapping have been successfully used as an initial step in fragmentbased drug discovery procedures [51,52], in comparing the binding sites of different related receptors [53], and in classifying protein kinases based on their ATP-binding sites [54]. Nevertheless, since these methods are mostly designed and used for small molecules docking, they are less adequate for detecting binding positions in the context of protein-protein interactions. In the latter case, the amino acid probe is a part of a much larger molecule (protein/peptide) whose presence can modify the local dielectric environment at the probe binding site. This electrostatic shielding effect requires an appropriate treatment in order to obtain reliable scores, specific for protein-protein and protein-peptide interactions.
A specific scoring function for protein-peptide interactions is implemented in the PepSite method, which uses spatial position scoring matrices derived from a large set of protein-peptide complexes and is aimed to identify preferred amino acid binding positions on a given protein surface. By combining the predictions of single amino acid binding sites with the sequence order of the peptide, the method was shown to correctly locate the binding position of many peptides [55]. Yet, the single amino acid predictions were not tested explicitly by the authors.
Here we use ANCHORSmap [56], a recently developed computational mapping procedure specifically designed to identify binding positions of single amino acid side chains, in the context of protein-protein interactions, to study the Arg-binding preferences of representative basophilic and non-basophilic Ser/Thr kinases. ANCHORSmap consists of a specialized scoring function which was calibrated and tested for the ability to accurately position

Author Summary
Protein kinases are key signaling enzymes and major drug targets that catalyze the transfer of phosphate group to a phospho-accepting residue in the substrate. Unraveling molecular features that govern the preference of kinases for particular residues flanking the phosphoacceptor (substrate consensus sequence, SCS) is important for understanding kinase-substrates specificities and for designing peptidic inhibitors. Current methods used to predict this set of essential residues usually rely on linking between experimentally determined SCSs to kinase sequences. As such, these methods are less sensitive when specificity is dictated by subtle or kinase-unique sequence/structural features. In this study, we took a different approach for studying kinases specificities, by applying a new fragment-based method for mapping amino acid side chains on protein surfaces. We predicted and characterized the preference of Ser/Thr kinases toward Arginine binding, using the unbound kinase structures. The method produced high quality predictions and was able to provide novel insights and interesting departures from the prevailing views regarding the specificity-determining elements governing specificity toward Arginine. This work paves the way for studying the kinase binding preferences for other amino acids, for predicting protein-peptide structures, for facilitating the design of novel inhibitors, and for re-engineering of kinase specificities.
residues at protein-protein interface and to reproduce experimental DDG values that were measured for alanine mutations [56].
We show that ANCHORSmap successfully discriminates between basophilic and acidophilic kinases and accurately identifies and top-ranks all P22 and P25 Arg-binding sites previously determined by X-ray crystallography. Furthermore, the Argbinding positions detected for all 10 examined kinases are in line with their SCSs. A detailed examination of the Arg-binding maps produced for the different kinases, together with in-silico mutagenesis, sequence alignments and available crystal structures of kinasepeptide complexes, indicates important roles for several previously unappreciated positions and structural features in kinase catalytic domain that govern the P22/P25 Arg specificity.

Results
The ANCHORSmap algorithm produces detailed binding maps of amino acid side chains on protein surfaces. The predicted binding positions (anchoring spots) are ranked by their calculated DG values, and adjacent anchoring spots can also be clustered into a single position to produce a sparser map of mean anchoring spots without significantly lowering the accuracy of the results [56]. In this work, the mean anchoring spots are reported unless otherwise stated, and in order to imitate a real prediction scenario, all of the calculations were performed on the unbound structures of the proteins.
Previous findings indicate that amino acids that have high propensity to form hot spots, such as Arg, Glu/Asp, Tyr, Trp and His, are also highly selective in binding to the entire protein surface [56]. As a preliminary test for the prediction sensitivity of ANCHORSmap for protein kinase surfaces, we tested the method for its ability to distinguish between basophilic and acidophilic kinases. Using both acidic (Glu) and basic (Arg) probes, the method produced a clear differential binding pattern between few representative basophilic and acidophilic kinases, indicating that it is sensitive enough for categorizing the basophilic/acidophilic nature of a given kinase without prior knowledge of its SCS (See Text S1 and Figure S1).
The Arg-binding positions previously shown by X-ray crystallography to anchor at the 22/5 site are accurately reproduced by ANCHORSmap as top-ranking solutions X-ray crystallography studies have shown that different kinases use the same surface site to bind Arg from either the P22 or P25 substrate positions. We will refer to this surface site as the 22/5 site. To the best of our knowledge, structures of kinase-peptide complexes in which an Arg residue has been shown to anchor at the 22/5 site are currently available for only four different basophilic kinases from three kinase families, defined in [26,57]: PKA [39] and PKB [40](AGC family), PIM1 (CAMK family) and PAK4 (STE family) [58].
Using the unbound structures of the proteins listed in Table 1, we tested the ability of ANCHORSmap to correctly reproduce their Arg-binding positions. Both detailed and mean Arg-binding maps were produced and an example from the top 20 mean Argbinding positions detected on the entire surface of PKB can be seen in Figure 1A.
Remarkably, although the search for Arg-binding positions started from thousands (,7500) of probes initially scattered over the entire surface of each kinase, the computed positions corresponding to the experimental Arg-binding positions were ranked extremely high. For three out of four cases, the rank of the most accurate position in the detailed maps was lower than 4, and the top-ranking mean solutions coincided with the experimental Arg-binding positions in every case ( Table 1). The solutions were also geometrically accurate: the average RMSD from the experimental positions was 1.660.3 Å , and for three out of four kinases, the top ranking mean Arg position was less than 1.7 Å from the experimental bound position (measured between the experimental and computed Arg Cf atoms, which represent the centers of the guanidino groups). The only exception was PIM1, for which a larger distance of 2.6 Å was obtained. Examination of the entire cluster of anchoring spots that contribute to the mean position of PIM1 showed a clear tendency of the lowest-energy binding positions to accumulate in close proximity to the experimental Arg position (Table 1 and Figure 1B).
The acidic residue at position 170 of the kinases has been implicated in imposing a preference for Arg at position P22 or P25 of the substrate [33]. A comparison of the unbound structures of different kinases (listed in Table 2) showed that the acidic 170 residue may adopt different conformations. For the peptide-unbound PKB structure (3D0E), the conformation of Glu170 uniquely and significantly deviated from the peptidecomplex structure (1O6K) and from the consensus unbound conformation observed for the other kinases ( Figure 2). Thus, for PKB, both a bound-like rotamer of Glu170 (reported in Table 1) and the unbound conformation were used in the calculations. The unbound conformation of Glu170 reduced the binding affinity (by 2.3 kcal/mol) and worsened the ranking (from 1 to 8) of the correct mean solution. Nevertheless, the location of the top solution remained the same, in line with Ben-Shimon and Eisenstein's finding that mean anchoring spots are particularly useful for unbound predictions [56]. The RMSD between the ANCHORSmap-identified and the experimental Arg-binding positions was calculated over all heavy atoms of the Arg residue, apart from the Cb atom. Cf distance -distance (Å ) between the identified and the experimental Arg Cf atoms.  These results are particularly striking in view of the challenging computational task. For example, we analyzed the 10 top ranking Arg positions produced by the PepSite server (http://pepsite. russelllab.org/) for each of the four kinases listed in table 1. The best result was achieved for PKB, for which PepSite solution ranked 6 was located in a distance of 4.8 Å (measured between the Arg Cf atoms) from the experimental position. This solution corresponds to the second top ranking solution produced by ANCHORSmap for PKB, see Figure 1. For the rest of the cases tested, no reported solution was found closer than 8 Å from the experimental position.
The Arg-anchoring spot maps correspond to Substrate Consensus Sequences (SCSs) Six additional kinases from the AGC, CAMK and STE kinase families, for which peptide-bound structures have not been determined experimentally but unbound structures as well as experimental SCSs were available, were analyzed next: PASK, CAMK-II (CAMK family), ASK1 (STE family), p70S6K, PDK1 and PKC (AGC family). This completed the test set to 10 kinases.
Substrate-specificity studies for PKC isozymes have resulted in several, sometimes inconsistent SCS definitions [29,38,59,60]. Therefore, the most frequent SCS of all PKC isozymes (RXXS/ TXRX) [61,62,63] was compared to the average results obtained for the four PKC isoforms (alpha, betaII, iota and theta) for which unbound crystal structures are available.
Eight out of ten SCSs in the set contained the robust basophilic signature of Arg at P23, but were diverse in terms of Arg-binding preferences for the 22/5 site: the set included kinases with no clear preference for Arg in either the P22 or P25 positions of the SCS (PKC, ASK1, CAMK-II, PDK), kinases with clear and exclusive P22 (PKA, PAK4) or P25 (PKB, p70S6K, PIM1) Arg  Table 2   preferences, and a kinase with dual P22 and P25 preference (PASK). Our goal was to predict the preference for Arg at the 22/ 5 site in general and to identify potential reasons for the preferences for 22 vs. 25 substrate positions.
The calculated DG values for Arg at the 22/5 site are in line with the existence of Arg in positions P22 and/or P25 of the corresponding SCS. The SCS of 6 out of 10 kinases in our set contains Arg at either the P22 or P25 positions (the ''22/ 5 binders'', highlighted in Table 2). Arg does not appear in these positions in the SCSs of the other four kinases (the ''22/5 nonbinders''). Figure 3 presents the calculated DG values for Arg at the 22/5 site of all 10 kinases. These energies were mapped onto the average energy distribution of Arg-binding positions, as obtained from the entire surface of a random set of 20 soluble unrelated proteins (see materials and methods). The predicted DG values sorted in perfect agreement with the experimental data, namely, the calculated values for all of the binders were lower than the corresponding DGs for the non-binders. For PDK1, no Argbinding position was detected at the 22/5 site, in line with the lack of specific preference N-terminal to the phosphoacceptor (T) in the SCS of PDK1 (TFCGT) [64].
The calculated DG values for the binders were found at the lowest part in the predicted energy distribution, which was occupied by only 2.5% of the cases in the random test set. This result indicated that even in terms of absolute values, the 22/5 sites of the 22/5 binders share a very unique and strong Argbinding environment, which is distinguishable from the group of non-binders.
The strongest Arg-binding position among the 22/5 nonbinder group was detected for PKC. The average calculated DG for the four isomers was 26.260.6 kcal/mol and was separated by only 1.2 kcal/mol from the weakest binding position detected for the 22/5 site binders (PKB, DG = 27.4 kcal/mol). This limited DG gap reflects some of the uncertainty regarding the ability of PKC to bind Arg at position 22 and at additional positions Nterminal to the P0 position (24,25,26) [60]. Interestingly, the strongest average DG value (26.661.5 kcal/mol) for Arg binding on the surface of PKC was detected in the region that accommodates residues C-terminal to the phosphoacceptor. This position was top-ranked for the betaII and theta isomers and ranked second for the iota isomer. For PKC alpha, the corresponding position was somewhat weaker in energy (DG = 25.2 kcal/mol) and it was ranked 11. Nevertheless, for all four isomers, this Arg interaction was similarly mediated by a cluster of acidic residues (positions 82-84) located on helix aC. Using statistics and an evolutionary model, Li and coworkers [42] suggested that the same surface region is adequate for accommodating the P+2 Arg that characterizes the SCS of PKC isomers (RXXS/TXRX) [61,62,63]. Indeed, the top predicted Arg-binding positions were located in very close proximity to the P+2 site ( Figure 4A). Interestingly, while Li et al. pinpoint residues 83 and 84 as the specificity-determining residues, our results suggest that position 82 is also important for Arg binding: in PKC alpha, betaII, and theta, positions 82 and 84 dominate the Arg interactions ( Figure 4B). Actually, it is only in PKC iota that a unique arrangement of the aC helix results in direct involvement of positions 83 and 84 in Arg binding ( Figure 4C). Notably, the interplay between all three acidic residues is probably enabled by the flexibility of the acidic cluster region [65].
Dissecting the 22/5 site into subsites that correspond to SCSs. Is it possible to predict, based on the unbound structure of the kinase, which of the substrate positions-P22, P25 or bothis likely to be occupied by an Arg? Such a high-resolution prediction is first and foremost dependent on the tendency of the two substrate positions to bind at distinct subregions within the 22/5 kinase site. Superposition of the four available crystal structures containing a P22/P25-bound Arg at the 22/5 site showed that the Arg residues exploit two distinct surface regions within the 22/5 site ( Figure 1A). However, in terms of position within the substrate sequence, the separation is less obvious. While the guanidino groups of two P22 Arg residues (PKA, PAK4) cluster into the same subsite (referred to as 22 subsite), and the P25 Arg of PIM1 binds in a distinct subsite within the 22/5 site (referred to as 25 subsite), in PKB, a P25 Arg binds at the 22 subsite ( Figure 1A).
For most kinases tested, ANCHORSmap detected either a single (4 kinases) or double (5 kinases) Arg-binding locations within the 22/5 site. The exception was PDK1, for which no Argbinding position was detected (discussed earlier). Visual inspection revealed that the number of predictions is related to the shape of the 22/5 pocket. Thus, a double prediction can potentially occur in kinases with an elongated pocket, whereas a single prediction is caused by the geometrical restriction of a shorter and tighter pocket, with limited available space. For an example of predictions in elongated vs. short pockets, see PKB and PKA in Figure 1B, respectively. The Arg-binding positions detected within the 22/5 site clustered to either one of the two subsites, with seven occurrences for each subsite. A binding position was ascribed to a subsite based on the shortest distance measured between the predicted position and the subsite centers. The latter were determined by averaging the positions of the three experimentally bound Arg Cf atoms (blue, orange and cyan) which define the 22 subsite and by the position of the single Arg Cf atom (magenta) which defines the 25 subsite, see enlargement in Figure 1A. For PAK4, the top ranking position (perfectly compatible with the experimental Arg position observed in the PAK4-peptide complex) is shifted to be between the two subsites, yet it is closer to the 22 subsite (shown in orange in Figure 1C).
In summary, the crystal structures as well as the predicted Argbinding positions point to the existence of two subsites within the 22/5 site that can potentially accommodate Arg from different positions of the substrate. Thus, the first essential condition for a possible discrimination between kinases with a P22 or P25 Arg preference is satisfied. Can such a preference be predicted?
The SCS of 3 out of 10 tested kinases contained Arg at position P22. For these three kinases, the calculated DG values of the Arg-binding positions detected at the 22 subsite were more favorable than for the corresponding positions detected in the rest of the tested kinases ( Figure 5A). Similarly, the calculated DG values of the positions detected at the 25 subsite were lower for 3 out of 10 kinases for which the SCS contained Arg at position P25 ( Figure 5B). A special case was PKB, for which the SCS contains Arg at P25 but the most preferred Arg-binding position was in fact identified at the 22 subsite, with calculated DG of 27.4 kcal/mol. Fortunately, the crystal structures (1O6K/1O6L) of the PKB-peptide complex show that our calculations are correct, and the P25 Arg indeed reaches the 22 subsite. Notably, the calculated DG at the 25 subsite is higher by only 0.6 kcal, implying that both subsites may be occupied by Arg residues. The predicted arg-binding positions explain both known and novel specificity-determining features at the 22/5 Site By combining sequence alignment of key kinase residues and information extracted from SCSs, studies have suggested different residues at several sequence positions of the kinase catalytic domain (positions 129, 133, 169, 170 and 230, based on PKA numbering) as crucial for controlling the P22/P25 Arg specificity [31,33,40,42]. The most dominant condition for attaining P22/ P25 Arg specificity was identified by Zhu et al. [33], who showed that among the AGC, CAMK and STE kinases, P22/P25 Arg specificity is highly correlated with the existence of a single pair of acidic residues in positions 170 and 230. The importance of this acidic pair is illustrated in the crystal structure of the PKA/PKI complex [66], in which Arg19 (P22) of PKI, which is anchored at the 22 subsite, is hydrogen-bonded to Glu170 and also forms a tight salt bridge with Glu230 ( Figure 6A). However, it was not established whether Arg binding at the 25 subsite would also use the 170/230 acidic pair for binding. Indeed, in the crystal structure of the PIM1-pimtide complex [58], position 170 do not seems to be directly involved in the binding of the P25 Arg. Moreover, for several kinases, the acidic pair is neither a sufficient nor an obligatory condition for attaining a strong Arg-binding preference [33,67], and none of the currently suggested residues is able to explain the P22/P25 Arg preference of all kinases. We propose that this preference is most likely dictated by a delicate balance between the entire residue composition at the 22/5 site and other essential structural elements which cannot be simply detected by sequence-based methods.
To support this idea we combine the information obtained from the predicted Arg-binding positions of selected kinases analyzed in this study (PKA, ASK1, P70S6K, PASK and PIM1) with structural and sequence comparisons of key positions making up the 22/5 site (some previously suggested in the literature and others suggested here for the first time), to examine in detail the Arg-affinity-determining components at the 22/5 site. We also provide an estimation of the binding-energy contribution of selected key residues to Arg binding, by performing in-silico mutagenesis and recalculating the Arg-binding energies with ANCHORSmap. Except for one case in which Phe was replaced by Ser, the rest of the mutations were chosen such that they minimally affect the shape of the 22/5 site pocket, yet enable to determine the contribution of the residue's functional group to Arg binding. This was done by replacing charged/polar residues with hydrophobic residues of comparable size, and selecting the appropriate side-chain rotamer which would optimally mimic the native surface shape of the 22/5 site pocket. Thus Thr, Asp/ Asn and Glu were replaced by Val, Leu and Met, respectively.
The 170/230 acidic pair is insufficient for attaining strong Arg binding at the 22/5 site of ASK1. For ASK1 (MAP3K5), the existence of the acidic pair 170/230 is insufficient for attaining a strong P22/P25 Arg preference [33,67]. A comparison between the 22/5 sites of ASK1 and PKA reveals that although the two kinases belong to different families (STE and AGC, respectively), their sites are very similar, exhibiting only two key residue differences. The first is at position 129, which is occupied by Ser in ASK1 and by Phe in PKA (Figures 6A and 6B). It was suggested [42] that Phe129 is important for PKA P22 affinity for Arg, thus supporting the lack of Arg affinity in the case of ASK1. ANCHORSmap calculation performed on in-silicomutated PKA showed that a Phe-to-Ser replacement at position 129 indeed reduces the predicted Arg-binding energy at the 22 subsite of PKA by 3 kcal/mol. However, the resulting DG value (28.7 kcal/mol) remains very favorable and is comparable to that of other 22 subsite Arg binders (see Figure 5A).
The second difference is observed at position 169, with Pro in PKA and Gly in ASK1. Gly at position 169 frees the amide backbone to form a hydrogen bond with the essential Glu230. Consequently, the 22 subsite of ASK1 is narrower and the restricted conformation of Glu230 impairs the ability to make optimal contact (as seen for PKA) with the anchored Arg. More importantly, the constrained Glu230 cannot form a hydrogen bond with Arg133 as it does in PKA. Consequently, Arg133 of ASK1 is rotated, exposing the 25 subsite pocket, which is blocked in both the bound and unbound active structures of PKA [66,68] (see Figure 6A). Previous ANCHORSmap calculations have shown that such a conformational change may affect the packing and dielectric environment at the 22 subsite, leading to an over 5 kcal/mol reduction in the binding affinity of Arg [56]. It has also been suggested that a non-bulky residue at position 133 dictates an Arg preference in the P25 rather than P22 position [40].
Sequence analysis revealed that while Pro is frequently found in position 169 of the AGC and CAMK families, at 74% and 67%, respectively, in the STE family, Pro occupies only 18% of the cases while the rest are occupied by the small residues Gly, Ala and Ser at approximately equal frequencies. Apart from ASK1, eight additional kinases in the STE family contain the 170/230 acidicpair pattern. Six belong to the PAK1-6 group [69], and the other two are closely related to ASK1 (MAP3K6 and MAP3K7). However, only the three ASK1-related kinases contain the unique sequence combination of Gly169 and Arg133 ( Figure 6C).
Since the significant structural rearrangement observed at the 22/5 site of ASK1 does not allow a reliable prediction of the effect of Pro replacement at position 169 of ASK1 by Gly-to-Pro mutation and energetic evaluation, it would be interesting to experimentally test how such a mutation affects the ability of ASK1 to prefer substrates with Arg at P22.
The crucial effect of Arg133 and its accurate positioning in providing a suitable packing environment for Arg binding at the 22 subsite is also evident from the comparison of the 22/5 sites of PKA and P706SK, both from the AGC family. The two sites are almost identical, differing by only one Arg-to-Glu replacement at position 133. Yet, similar to ASK1, P706SK does not prefer Arg at the 22 subsite, because the positioning of Glu133, similar to Arg133 in ASK1 does not provide the required packing environment for strong Arg binding at the 22 subsite (compare Figures 6B to 6D).
Importantly, ANCHORSmap automatically captured the physicochemical differences observed at the 22 subsites of PKA, ASK1 and P70S6K as it detected a strong Arg-binding position (211.7 kcal/mol) for PKA, as opposed to weak binding positions at the corresponding sites of ASK1 and P70S6K, with DG values of 24.3 and 24.9 kcal/mol, respectively.
Acidic pairs selected from positions 133, 169 and 230, but not from position 170, are important for Arg binding at the 25 subsite. A strong preference for Arg at P25 is less frequently observed than at P22 [31]. In our set, correspondence between a strong Arg-binding position at the 25 subsite and the preference for Arg at P25 was found for three kinases, PIM1, P70S6K and PASK, with calculated DG values of 214.7, 29.7 and 28.3 kcal/mol, respectively.
It has been suggested that Asp at position 169 dictates Arg specificity at P25 [31]. Such specificity characterizes the group of PIM kinases [70], which is the only group in the CAMK family (3 out of 83) that also has an acidic-pair pattern. The crystal structure of the PIM1-pimtide complex indeed shows direct involvement of Asp169 in the extensive network of five hydrogen bonds that mediate the strong peptide P25 Arg interaction at the 25 subsite. Apart from the side chain of Asp169, the anchored Arg forms hydrogen bonds with the side chains of Thr133 and Asp230, and with the backbone carbonyl at positions 129 and 234 ( Figure 6E). ANCHORSmap calculation on in-silico mutated PIM1 showed that among the three side chains, Thr133, Asp169 and Asp230, it is the latter pair of acidic residues that strongly dominates the Arg interaction, each contributing around 5 kcal/mol to the interaction, whereas Thr133 is not crucial for Arg binding (Table 3).
P70S6K also prefers Arg at P25. However, in contrast to PIM1, position 169 of P70S6K is occupied by Pro and not by an acidic residue. A detailed examination of the ANCHORSmappredicted Arg-binding position detected at the 25 subsite of P70S6K revealed that the Arg interaction is mediated by the important acidic residue at position 230 (as in PIM1) combined with Glu133 ( Figure 6D). Our in-silico mutagenesis analysis confirmed that these two acidic residues, Asp230 and Glu133, dominate the Arg interaction at the 25 subsite of P70S6K, with an estimated energetic contribution of 22.6 and 22.9 kcal/mol, respectively (Table 3).
Most interesting is PASK, which contains Thr instead of an acidic residue at the important position 230, but nevertheless uncharacteristically prefers Arg at both P25 and P22 [31]. The acidic-residue content of PASK displays a combination of PIM1 and P706SK: similar to the PIM group, it contains Asp169, and similar to P70S6K, it contains Glu133.
There is currently no complex structure of PASK with a peptide that would enable detailed analysis of the Arg interactions, but ANCHORSmap detected a strong Arg-binding position for both the 22 and 25 subsites of PASK, with calculated DG values of 29.1 and 28.3 kcal/mol, respectively. We could therefore analyze the interaction of the predicted Arg at the 22 subsite of PASK, as well as the estimated binding-energy contribution of each key position using in-silico mutagenesis analysis, to reveal that the lack of acidic residue at position 230 is mostly compensated for by Asp169 as well as Asn236, which are estimated to contribute 24.8 and 21.4 kcal/mol to the Arg interaction, respectively ( Figure 6F and Table 3). These calculations support the novel involvement of Asp169 as well as Asn236 in the Arg preference at P22 proposed herein.
Examination of the predicted Arg interaction at the 25 subsite of PASK showed that the lack of acidic residue at position 230 and the fact that Asp169 is already mostly involved in recognition of the P22 Arg, are mainly compensated for by Glu133. In-silico mutagenesis analyses confirmed this observation. Together, Glu133 and Asp169 contribute approximately 25 kcal/mol to the interaction, with an estimated binding energy of 23 and 21.8 kcal/mol, respectively.
The uncommon acidic-residue content and the distinctive interaction arrangement at the 22/5 site of PASK are probably responsible for the unique dual P22/P25 Arg preference by this kinase.
Analysis of top-ranking Arg-anchoring spots in PIM1, P70S6K and PASK indicated that acidic residues at positions 133, 169 and Table 3. In-silico mutagenesis highlights key positions for Arg binding.  Table 3).
The observed dual involvement of both subsites in Arg binding, as well as the significant energy contribution to Arg binding measured for all kinases tested (except for PASK which contains Thr in position 230), indicate that an acidic residue at position 230 serves as a pivotal element in the recognition of Arg at the 22/5 site (Table 3). An acidic residue in position 170, on the other hand, emerges as 22 subsite-specific, and contributes less to the overall binding of Arg. These results illustrate how ANCHORSmap enables correct automated predictions of pocket preferences and of residues that are important for determining specificity, as supported by in-silico mutagenesis and repeated ANCHORSmap calculation.

Discussion
We used ANCHORSmap, a novel computational mapping approach specifically designed for the detection of favorable binding positions of amino acid probes on the surfaces of proteins, to investigate the Arg preference determining elements in Ser/Thr protein kinase substrate-binding grooves. We focused mainly on the P22/P25 Arg-binding preferences that typically characterize the SCS of a particular surface region, defined by us as the 22/5 site.
Initially, we demonstrated that the ANCHORSmap method produces high-quality predictions by detecting differential binding patterns on the surfaces of representative basophilic and acidophilic kinases. This enabled successful discrimination between the two types of kinases without any prior knowledge of their SCSs. It also suggested that kinases might employ an either/ or strategy in which their substrate-binding groove is optimized for binding either acids or bases, but not both. A kinome-wide analysis is needed to investigate this idea and is planned for further studies.
Importantly, using the unbound kinase structures, the method accurately reproduced and top-ranked the X-ray crystallographydetermined Arg-binding positions at the 22/5 site of four different kinases; it also showed excellent correspondence between the calculated DG values obtained for Arg at the 22/5 site of all 10 kinases tested in this work and the preference for Arg in the experimentally determined SCSs.
In phosphorylatable peptide libraries, the SCSs of basophilic kinases emerge with a dominant signature preference for Arg at P23 [27,31]. However, for the group of basophilic kinases tested in this study, carrying an Arg preference at both the P23 and P22/P25 positions, the Arg positions detected at the 22/5 site were almost exclusively top-ranked, pointing to the 22/5 site as the most preferred binding environment for Arg on the entire kinase surface. ANCHORSmap detected an adequate binding position for accommodation of P23 Arg in all cases, yet both its ranking and calculated energies were significantly weaker (data not shown). However, since the kinase P23 Arg interaction is known to involve, in some cases, the ATP molecule as well [66,71,72], the absence of ATP during the calculations does not allow for accurate DG calculation, making the comparison between the two sites difficult. It must also be taken into account that the experimentally observed dominant preference for Arg at P23 is measured using phosphorylation activity, whereas we estimate the contribution of Arg to binding. Catalytic activity and binding affinity do not necessarily have to be correlated. A series of experiments [73] showed that the intrinsic affinities of several protein substrates to their respective kinases are weak compared to their apparent affinities measured in traditional steady-state kinetic-activity assays. Experimental studies with PKA and the protein kinase inhibitory peptide PKI support the hypothesis that the P23 position may be important for catalysis but less important for the binding itself. It was shown that replacement of Arg19 of the inhibitory peptide (at position P22, experimentally shown to bind at the 22 subsite (PDB code 1ATP)) by Gly reduces the inhibition by 520-fold, while similar Arg replacement at the equivalent P23 position (Arg18) reduces the inhibition by only 90-fold [74]. Yet in substrate peptides of PKA selected for catalytic activity, Arg at position P23 is the most dominant one [31,33]. Moreover, the PKA-PKI complex is similarly formed with [66] (PDB code 1ATP) or without [75] (PDB code 1APM) the ATP molecule, even though in the former case, the guanidino group of the P23 Arg is hydrogen-bonded to the ATP ribose. This indicates that the P23 Arg is not the most crucial residue for inhibitor binding. Note that while the 22/5 site is located on the C-lobe, the P23 Arg interacts with the N-lobe, requiring an optimal geometry of the two lobes for contact formation. Such contact may stabilize the two lobes together, enabling the accurate geometry for an appropriate catalytic activity, but might be less important for binding of an inhibitor.
The binding information obtained by ANCHORSmap is not identical to the information that can be gained from phosphorylation peptide arrays. Thus it is particularly useful in the design of kinase peptide inhibitors, as opposed to the design of optimal kinase substrates. Indeed, ANCHORSmap results were recently used to rationalize the structure-activity relations of a peptidomimetic library of novel PKB kinase inhibitors [76].
The four currently available crystal structures containing a P22/P25-bound Arg at the 22/5 site imply that Arg can potentially bind in two distinct subsites within the 22/5 site. However, the limited number of available complexes has made it difficult to draw a clear conclusion regarding the subsite separation inside the 22/5 site and its relation to the Arg position along the substrate sequence. The search for potential Arg-binding positions within the 22/5 site helped resolve this issue: it supplied a clear dichotomization of the predicted Arg positions between two separate subsites, 22 and 25, each with a different set of predicted Arg positions that usually corresponded to the location of Arg in the P22 and P25 positions of the SCS, respectively.
Finally, we used the predicted Arg-binding positions together with structural, sequence and in-silico mutagenesis analysis of key residues, to explain known and novel structural and sequence specificity-determining features that govern the Arg interaction at the 22/5 site. The analysis showed that in many cases, the interaction strength is underlined by a delicate balance between several attributes of the binding site, architectural and chemical, which cannot be simply obtained from sequence alignment comparisons, but emerge automatically from the ANCHORSmap-predicted positions and accompanying DG values. This is due to the fact that ANCHORSmap utilizes the information embedded in the structure of the protein, capturing subtle changes in the physicochemical properties of the binding site. The predicted positions were then used to trace back the role of individual key residues and structural features that determine preference toward Arg in particular substrate positions. This methodology provided several new hypotheses regarding Arg specificity-determining elements at the 22/5 site, which can be further tested experimentally. We suggest that the inability of ASK1 to strongly bind Arg, in spite of the existence of the 170/ 230 acidic pair, can be explained by a unique sequence composition that affects the architectural arrangement at the 22/5 site of ASK1. We could therefore hypothesize that the preference for a P25 Arg is not governed by the 170/230 acidic pair, as it is for P22 Arg and as was previously assumed, but can be governed by several different pairs of acidic residues, selected from positions 133, 169 and 230, whereas position 170 affects only the binding of Arg at the 22 subsite. Acidic residue at position 230 on the other hand, serves as a pivot element in the recognition of Arg at both subsites.
Computational mapping of amino acid-anchoring spots on kinase surfaces can provide testable hypotheses regarding kinase specificity and peptidomimetics affinity [77] and may be used as input for anchor-driven peptide-docking [14] to predict 3D structures of kinase-peptide complexes. It is therefore expected to promote our understanding of kinase regulation and expand the possibilities for the design of kinase-specific signaling modulators.
A compendium of peptide-anchoring sites obtained in this work is available upon request, providing a basis for the development of novel kinase modulators for biochemical research.

Producing binding maps with ANCHORSmap
The ANCHORSmap method. Briefly, the ANCHORSmap algorithm [56] consists of two stages: (i) a geometry-based step, in which subpockets that can accommodate single amino acid side chains are detected on the surface of the protein and amino acid probes are scattered near them. This produces a non-random yet exhaustive distribution of thousands of probes over the entire protein surface; (ii) an energy-based step in which the positions of probes, initially scattered, are optimized by several cycles of energy minimization and clustering. The minimizations are performed with the Gromacs software, version 3.3.3 [78], employing the united atoms gromos96 43a1 force field [79]. The binding energies of the probes (DG p ) are estimated by an empirical scoring function adjusted for the context of protein-protein interactions: The scoring function free parameters were optimized and statistically validated using calculated versus experimental DDG values of 57 alanine mutations (DDG = DG Ala 2DG p ) of interface residues.
The scoring function consists of a corrected van der Waals (vdW) energy term (V9), a corrected and weighted (l e ) electrostatic term (E9) and a weighted (l s ) desolvation energy term, of the probe (2S p ) and of the anchoring cavity (2S cav ) (eq. 1).
The Corrected vdW (V9) is normalized as suggested by Pan et al [80]. The Gromacs vdW energy (V) is divided by N a , where N is the number of non-hydrogen atoms of the probe and a is a free parameter (eq. 2).
The electrostatic energy computed with Gromacs (E) is corrected (E9) to include the possible changes in the local dielectric environment induced by a hypothetical protein bound to the probe (eq. 3).
Thus, an effective dielectric coefficient, e eff , is calculated which takes values between e (1.5 as in the Gromacs minimizations) and e max (free parameter) and represents the local dielectric binding environment (eq. 4).
e eff~f A pol : p : e max {e ð Þ ze ð4Þ The effective dielectric coefficient is estimated as the fraction of polar surface area that remains accessible in the bound probe (fA pol ) multiplied by the probability that this polar area is buried in a hypothetical protein-protein interface (p). The probability p rises exponentially as the exposed surface area of the bound probe (A) grows, but it is limited to p#1 (eq. 5).
p~c1 : e c2 : A (pƒ1) ð5Þ The fitted values of the C1 and C2 constants are 0.0024 and 0.167, respectively. Hence, e eff for a probe that is deeply buried in a surface pocket (low p) is close to e, whereas a probe that is only partially buried feels higher dielectric shielding. The solvation energy (S p or S cav ) is estimated as the conventional sum over the product of atomic solvation parameter and solvent exposed area.
The above procedure produces a very detailed map of anchoring spots which is usually unnecessary when searching the entire surface of a protein. Therefore the algorithm enables averaging a cluster of adjacent anchoring spots into a single mean position, to produce a sparser map of mean anchoring spots. The averaging is weighted by the DG of each probe in the cluster, giving more weight to predictions with lower DG, and the associated energy of the mean position is set to the lowest DG in the cluster.
In this work, we used the default parameters of ANCHORSmap as previously described by [56] using a RMSmin of 3 Å to produce the map of mean anchoring spots.
Kinase structure preparation. Only kinase structures for which all the residues surrounding the 22/5 site region are resolved were used for the analysis. Kinase coordinates were obtained from the Protein Data Bank (PDB) [81]. Hetero atoms were excluded before the ANCHORSmap calculations.

Sequence analysis
Kinase sequences were retrieved from KinBase (http://kinase. com). Only sequences of the catalytic domain of human kinases were included in the analysis. Sequence alignment was performed using the T-coffee server [83,84].

In-silico mutagenesis
Apart from the Phe129Ser replacement in PKA, in which the rotamer of the corresponding Ser in ASK1 was used, for the rest of the mutations, the rotamer which optimally mimicked the native surface shape of the 22/5 site pocket was selected by visual inspection. The selected rotamers were then subjected to a short (20-step) steepest descent energy minimization to remove steric clashes with the protein. Side-chain replacements, rotamer selections and energy minimizations were performed with builtin tools of the Discovery Studio package V2.5. Figure S1 Differential anchoring spot mapping for basophilic and acidophilic kinases. (A) Top-ranking binding positions detected for Glu (red) or Arg (blue) probes at a distance shorter than 10 Å from the substrate-binding region of the acidophilic kinases GSK3 (1O9U) and CK2 (3H30), and the basophilic kinases PKA (1BKX) and PAK1 (3FY0). Each column represents a single binding position. For distance measurement, see text. (B) Viewing the distribution of the 10 top-ranking Glu and Arg predictions on the entire surfaces of CK2 and PAK1. The position of the PKI peptide (green line), presented here with only three amino acids on each side of the P0 position (green sphere) and the position of the ATP molecule (brown spheres) were determined by superposing the structure of the PKA-PKI complex (1ATP) on each kinase. For each kinase, the top 10 mean anchoring spots detected for the Arg (represented by the Arg Cf atom) and Glu (represented by the Glu Cd atom) probes are shown as blue and red spheres, respectively. Note that some of the probes are invisible as they are located on the back side of the protein.

Supporting Information
Black arrows mark the corresponding binding positions appearing in panel A. (TIF) Text S1 Determining whether a kinase is basophilic or acidophilic. (DOC)