SH3 Domain Tyrosine Phosphorylation – Sites, Role and Evolution

Background SH3 domains are eukaryotic protein domains that participate in a plethora of cellular processes including signal transduction, proliferation, and cellular movement. Several studies indicate that tyrosine phosphorylation could play a significant role in the regulation of SH3 domains. Results To explore the incidence of the tyrosine phosphorylation within SH3 domains we queried the PhosphoSite Plus database of phosphorylation sites. Over 100 tyrosine phosphorylations occurring on 20 different SH3 domain positions were identified. The tyrosine corresponding to c–Src Tyr-90 was by far the most frequently identified SH3 domain phosphorylation site. A comparison of sequences around this tyrosine led to delineation of a preferred sequence motif ALYD(Y/F). This motif is present in about 15% of human SH3 domains and is structurally well conserved. We further observed that tyrosine phosphorylation is more abundant than serine or threonine phosphorylation within SH3 domains and other adaptor domains, such as SH2 or WW domains. Tyrosine phosphorylation could represent an important regulatory mechanism of adaptor domains. Conclusions While tyrosine phosphorylation typically promotes signaling protein interactions via SH2 or PTB domains, its role in SH3 domains is the opposite - it blocks or prevents interactions. The regulatory function of tyrosine phosphorylation is most likely achieved by the phosphate moiety and its charge interfering with binding of polyproline helices of SH3 domain interacting partners.


Introduction
The SH3 domain is one of the most well characterized protein interaction modules. SH3 domain-mediated signaling is involved in all basic cellular processes as well as in many pathological conditions, including malignant transformation (reviewed in [1]).
SH3-mediated signaling processes are mostly driven by the recognition of polyproline-II helices by SH3 domain structures [2]. The SH3 domain ligand-binding surface plays a key role in intramolecular and intermolecular interactions [3]. It contains three hydrophobic pockets, each containing a cluster of conserved amino acid residues. Mutational analysis of SH3 domains identified key residues necessary for interactions with ligands. For example, the essential residues for Src SH3 domain ligand binding are Y90, N135 and Y136 in the first pocket, Y92, W118 and P133 in the second pocket, and D99 and Y131 in the third pocket (numbering based on chicken c-Src) [4]; [5].
Protein phosphorylation is one of the most fundamental regulatory events in eukaryotic cells [6]. The importance of reversible tyrosine phosphorylation in the regulation of essential cellular functions is underscored by the fact that tyrosine kinases comprise the largest group of oncoproteins [7].
During the past two decades, tyrosine phosphorylation within SH3 domains of several signaling proteins was discovered [12]; [13]; [14]. In some cases, mutational analyses were performed to determine the functional importance of a particular phosphorylated tyrosine. Results of these studies brought substantial evidence for a significant role of phosphorylation on the well conserved tyrosines within SH3 domain hydrophobic pockets in regulating the binding capacity of the SH3 domain and intramolecular regulation of signaling proteins. This mechanism of regulation seems to be used in various cellular processes and we hypothesize that it could be universally applicable to regulate signal transduction pathways mediated by proteins containing SH3 domains.
We surveyed available data from phosphoproteomic and structural studies to explore the abundance and variability of SH3 domain tyrosine phosphorylation sites, and to identify SH3 domain phosphorylation motifs. We also analyzed structural conservation of the ALYD(Y/F) motif -the most frequently phosphorylated SH3 domain motif. Our results further support recent experimental observations that tyrosine phosphorylation within SH3 domains plays a critical role in the regulation of their function.

Results and Discussion
Survey of SH3 Domain Phosphorylation SH3 domains are common protein interaction modules. Over 16000 SH3 domains in more than 12500 different proteins are described in the SMART database (October 2011). More than 97% of those occur in eukaryotic proteins. A growing body of experimental evidence indicates that tyrosine phosphorylation plays a significant role in regulation of many SH3 domains (Table 1).
Tyrosine phosphorylation of SH3 domains has an unorthodox effect on protein function. Tyrosine phosphorylation is perhaps best known for its role in facilitating protein-protein interactions through the recognition of phosphotyrosine by a protein with a SH2 or PTB domain [15]. This usually leads to signal propagation. In contrast, the tyrosine phosphorylation of SH3 domains prevents or reduces the affinity of protein-protein interactions (Table 1). This can cause a switch in cell behavior, as in the case of chronic myeloid leukemia cells where phosphorylation of the SH3 domain of c-Abl enhances transformation potential [16], [13].
We queried the PhosphoSite Plus database for all phosphorylations within SH3 domains. At the time of the survey (October 2011), 188 distinct phosphorylation sites in 127 different SH3 domains were described in the database (File S1). Of these, 106 were tyrosine phosphorylations which were further analyzed.
SH3 domain sequences were aligned to determine the abundance of tyrosine phosphorylations at individual positions within the domain (Figure 1). To avoid redundancy, we included only one of the orthologue and paralogue (isoforms) sequences with identical phosphorylation pattern in the alignment. Fifty-two unique SMART-based SH3 domain sequences were aligned (File S2). A total of 36 protein domains were phosphorylated at one tyrosine site, 15 of them at two tyrosine sites and one (PLCc2) on three sites.
We also analyzed the conservation of phosphorylation amongst orthologue sequences. We found 20 phosphorylated tyrosines that occurred in SH3 domains of two ortologues (Table S1) and eight phosphorylated tyrosines that were present in SH3 domains of three orthologues (Table S2). Our previous experimental data have further confirmed the phosphorylation of Tyr 12 in human, rat and mouse p130Cas [14]. Although available phosphorylation data are incomplete, we can conclude that a significant proportion of phosphosites (64 out of 106 in our survey) is present in more than one organism, further supporting the importance of SH3 domain tyrosine phosphorylation.
To unify the numbering of positions, we used the protein amino acid positions in the alignment in Figure 1 as our reference. The alignment showed that most tyrosine phosphorylations were detected at positions 7 and 66 ( correspond to Y90 and Y131 in chicken Src SH3 domain localized, respectively, in the first and the third surface hydrophobic pockets. Therefore, both of these tyrosines are involved in ligand binding [17].

Analysis of Abundant Phosphorylation Sites
To further analyze the sequence surrounding the two most phosphor-enriched positions of Tyr 7 and 66, we created sequence logos ( Figure 2, WebLogo [18]). The resulting consensus logos show an absence of strong amino acid conservation around Tyr 66 (with the exception of Pro on the position +2), while sequence around phosphorylated Tyr 7 is more conserved. Alanine at the position 22, leucine at position 21, and aspartate at the position +1 from Tyr 7, all show very strong conservation. The position +2 is predominantly occupied by amino acids with an aromatic ring -tyrosine and phenylalanine. Thus ALYD(Y/F) is the most favorable motif for tyrosine phosphorylation in the SH3 domain. Since the sequence around Tyr 66 was not as well conserved and there were fewer observations of tyrosine phosphorylation on this site, further analysis was concentrated on the Tyr 7 site.
Of 304 human SH3 domains in the SMART database, the ALYDY motif around Tyr 7 appears in 21 domains and the ALYDF motif appears in 15 SH3 domains (File S3). Of those 36 sequences, 12 are known to be phosphorylated at Tyr 7 according to PhospositePlus. There are many domain definition programs available and they differ significantly in a number of predicted proteins with a particular domain. We therefore also evaluated the number of human SH3 domains with ALYD(Y/F) sequence with an independent domain definition program -Pfam [19]. There are 750 human SH3_1 domains in the Pfam database. Among those, there are 113 sequences with either ALYDY (64) or ALYDF (49) motif (File S4 and S5). Results from the two independent domain definition systems roughly agree on the estimate that 12-15% of human SH3 domains possess ALYD(Y/F) sequence motif that can potentially be phosphorylated.
Although it is very unlikely that all these motifs will get phosphorylated, we expect more experimental evidence on the significance of Tyr 7 phosphorylation in the near future due to an ever increasing amount of phosphosite data.
The high conservation of the sequence around Tyr 7 suggests that it could be phosphorylated by a specific group of kinases. We used GPS [20] and PhosphoMotifFinder [21] database/software to predict kinases responsible for the phosphorylation of Tyr 7 within the ALYD(Y/F) motif. Both programs indicated Src-family kinases as the likely kinases for this site. GPS further suggested FAK, Btk, PDGF and Abl as potential kinases. ALYDY motif of Itk was shown to by autophosphorylated by Itk itself [22]. This suggests that the phosphorylation of Tyr 7 is not mediated by a particular kinase in a specific cellular compartment. Rather, different kinase families working in different compartments of the cell may regulate SH3 domains through phosphorylation of the ALYD(Y/F) motif.

Structural Conservation of the ALYD(Y/F) Motif
Structural alignment was employed to further evaluate the ALYD(Y/F) motif. There are 104 known 3D structures of SH3 domains with an ALYD(Y/F) motif, representing 16 different proteins. A representative structure was selected for each protein and structurally aligned to c-Src structure (1FMK). Results of the structural alignment are shown in Table 2. The ALYD(Y/F) motif is a part of the loop that connects the first and second strands in the structure. The loop folds in the conformation that is similar to the structure of two interacting strands in b-sheet ( Figure S1). The conformation of the loop is (in the case of 1FMK) stabilized by three hydrogen bonds in-between the main chain atoms of amino acids within this loop: two hydrogen bonds between Tyr 9 and Phe 24 and by a hydrogen bond between Ala 5 and Gly 27. It is also stabilized by a hydrogen bond between the main chain atoms of Leu 6 and Tyr 71 that lies in the loop connecting strands four and five (numbering based on alignment in Figure 1).
The ALYD(Y/F) motif is structurally well conserved. The root mean square distance (RMSD) for the C-alpha atoms in the motif was found to be typically less than half of the average RMSD for the whole SH3 domain (Table 3). Figure 3 further shows that even Phosphotyrosine position within SH3 refers to the position in alignment in Figure 1. side chain conformations of the residues in this motif are very well conserved. The structural similarity holds true even for the ALYD(Y/F) motif in proteins without experimentally verified phosphorylation in SH3 domains.
The RMSD values of the ALYD(Y/F) motif did not correspond to sequence identity of aligned structures. Even the structures with a rather low sequence identity to c-Src had better RMSD values to c-Src than closely related proteins from Src kinase family. The best structural match of ALYD(Y/F) motif of human c-Src was found in SH3 domain of myosin IB from Acantamoeba castellanii (2DRM, 32% seq. identity, 0.18 Å RMSD).
Acanthameboa castellanii belongs to Amebozoa, sister group to Opisthokonta (fungi and animals). Interestingly, according to a gene discovery study, Acantamoeba castellanii does contain basic elements of phosphotyrosine signaling pathway, including animal tyrosine kinase families, tyrosine phosphatases and proteins with SH2 domains [11].
A very good structural match was also found between ALYD(Y/F) motif of human c-Src and those of two SH3 domains from Saccharomyces cerevisie. ALYD(Y/F) motif occurs in four out of 29 S. cerevisie SH3 domains. However, animal tyrosine kinases have not been detected in yeast [15]. Nevertheless, this does not mean that phosphorylation on tyrosines does not occur in yeast. For example, kinase Swe1 inhibits the activity of Cdc28 by phosphorylation of its Tyr 19 [23].
The strong structural conservation of the ALYD(Y/F) motif in Amebozoa and Opisthokonta could indicate that this mode of regulation is not a recent invention, but appeared before Amebozoa and Opisthokonta segregated.

Tyrosine Phosphorylation Is Enriched in Other Docking Domains
We observed that tyrosine phosphorylations represent an unusually high proportion (68%) of all phosphorylations in SH3 domains,. We thus wanted to find out whether a prevalence of tyrosine phosphorylations is unique to SH3 domain or could be observed in other adaptor domains. We chose SH2, PH, PDZ, WW, PTB, EH, PX for further analysis [24]. Using Phosphosi-tePlus database we searched for phosphorylation sites within these domains separately. We used only human proteins to avoid redundancy. For each domain we counted the ratio of tyrosine- Structures were aligned to SH3 domain of human Src protein (1FMK). Two parameters are measured for each structural alignment -root mean square distance of the whole SH3 domains (Average RMSD) and of the ALYD(Y/F) motif (RMSD (5-9)). Sequence identity to human 1FMK SH3 domain was calculated using ClustalW. doi:10.1371/journal.pone.0036310.t003  [45] and in human Abl [13]. The figure was created using PyMol. doi:10.1371/journal.pone.0036310.g003 phosphorylated sites to all of phosphorylations ( Table 4). The statistics showed that there are 13324 (21,4%) human phosphotyrosine sites, 11618 (18,6%) human phosphothreonine sites and human 37410 (60%) phosphoserine sites in the PhosphoSitePlus database The tyrosine phosphorylation was overrepresented (in comparison to the database statistics) in five (out of seven) selected adaptor domains. There are three domains (SH2, WW and EH), where more than 50% of all documented phosphorylations are tyrosine phosphorylations. However, only a very few phosphorylations of EH have been observed. This suggests that tyrosine phosphorylation could also be an important regulatory mechanism for other adaptor domains. However, in human protein evolution tyrosine loss is strongly favored, most notably in protein subsets that are not known to be tyrosine phosphorylated (Tan-25). Thus the higher proportion of tyrosines in adaptor domains is in agreement with their higher tyrosine phosphorylation. Morover, the trend for enrichment of tyrosine phosphorylation in adaptor domains is maintained even after a correction to tyrosine content (Table 4). An other possible explanation of tyrosine phosphorylations enrichment in adaptor domains could be provided by the work of Fabian et al. which showed that while phosphorylation of serine residue had no impact on the structure of non-phosphorylated tau peptide, phosphorylation of the tyrosine results in considerable conformational changes [25].
In this study, we showed that tyrosine phosphorylation has been detected in a number of SH3 domains. The most phosphorylations have been detected at the position in the SH3 domain that is responsible for substrate binding. The experimental evidence shows that this tyrosine phosphorylation interferes with binding of SH3 domain to its interacting partners. We also showed that tyrosine phosphorylations occur frequently in other adaptor domains and could therefore represent an important regulatory mechanism of these domains.

Phosphorylation Search and Evaluation
All tyrosine phosphorylation sites in SH3 domains were identified in the PhosphoSite Plus database [26], curated and currently one of the most comprehensive databases of posttranslational modifications. For each hit from the PhosphoSite Plus, the occurrence of phosphorylation site in SH3 domain was carefully validated using the SMART (Simple Modular Architecture Research Tool) domain identification program [27]. Hits from PhosphoSite Plus that were not part of SMART-defined SH3 domains were not included in subsequent analyses. In case of doubt, the Uniprot annotation team was contacted for consultation, which led once to update of domain definition of a particular entry in the Uniprot database [28].
PhosphoSitePlus was also used to find tyrosine phosphorylation sites in other adaptor proteins. To avoid redundancy, we used only human proteins to calculate the ratio of tyrosine to all phosphorylations.
The occurrence of serines, threonines and tyrosines was calculated for the set of all human proteins as defined by Uniprot and compared to occurrence of these amino acids in the sets of all human SH3, SH2, PH, PDZ, PTB, EH,PX, WW domains as defined by Pfam [19].
Normalized relative phosphotyrosine enrichment was calculated as the ratio of tyrosine phosphorylations to the number of tyrosines in adaptor domains to the ratio of tyrosine phosphorylations to the number of tyrosines for all human proteins.

Motif Definition and Motif Searches
The SH3 domains with identified tyrosine phosphorylations were aligned using ClustalW [29]. The alignment was further used to describe sequence motifs around two most frequently phosphorylated positions using WebLogo [18].
The ALYD(Y/F) motif, identified around most frequently phosphorylated position 7, was used to estimate abundance of tyrosine phosphorylation in SH3 domains of human proteome.
Simple text search was used to locate ALYD(Y/F) motif in all SH3 domains in the SMART and Pfam databases [27]; [19]. The Clustal W [29] was used to align sequences with identified ALYD(Y/F) motif and sequences with ALYD(Y/F) motif around Tyr 7 were selected. GPS 2.1 (Group-based Prediction System) Online service [20] and PhosphoMotifFinder [21] were used to identify kinases that could phosphorylate tyrosines in ALYD(Y/F) motif. The number of phosphorylations within selected domains was analyzed by PhosphoSite Plus and the ratio of phosphotyrosine sites (pY) to all sites was calculated. The complete human proteome from the Uniprot database was chosen to calculate the number of tyrosines among human proteins. The complete sets of human proteins containing adaptor domains was selected using the Pfam database. Normalized relative enrichment of tyrosine phosphorylation shows ratio of percentage of pY to percentage of pY in human proteome normalized to number of tyrosines. doi:10.1371/journal.pone.0036310.t004

Structural Analysis
The PDB [30] was used to find all ALYD(Y/F) motifs in SH3 domains with known 3D structures. One representative structure for each SH3 domain with more than one experimentally solved structure was selected. All selected 3D structures of SH3 domains with ALYD(Y/F) motif were aligned to a reference SH3 structure (1FMK; [5], a high-resolution structure of human Src protein, using LSQMAN program [31]. The RMSD (Root mean square distance) for the whole SH3 domain and the described motifs were calculated and compared. ClustalW at the EBI webpage was used to calculate sequence identities between aligned structures [32]. PyMol was employed to visualize the results. Figure S1 b-sheet-like structure of a loop with ALYDY motif. The ALYDY motif is located in the loop that connects first and second b-strand in Src SH3 domain (1FMK). The loop conformation is stabilized by three hydrogen bonds in-between loop residues and by a hydrogen bond between Leu 6 and Tyr 71 (orange).

(TIF)
File S1 UNIPROT codes of proteins with SH3 domain phosphorylation.