Phosphotyrosine Substrate Sequence Motifs for Dual Specificity Phosphatases

Protein tyrosine phosphatases dephosphorylate tyrosine residues of proteins, whereas, dual specificity phosphatases (DUSPs) are a subgroup of protein tyrosine phosphatases that dephosphorylate not only Tyr(P) residue, but also the Ser(P) and Thr(P) residues of proteins. The DUSPs are linked to the regulation of many cellular functions and signaling pathways. Though many cellular targets of DUSPs are known, the relationship between catalytic activity and substrate specificity is poorly defined. We investigated the interactions of peptide substrates with select DUSPs of four types: MAP kinases (DUSP1 and DUSP7), atypical (DUSP3, DUSP14, DUSP22 and DUSP27), viral (variola VH1), and Cdc25 (A-C). Phosphatase recognition sites were experimentally determined by measuring dephosphorylation of 6,218 microarrayed Tyr(P) peptides representing confirmed and theoretical phosphorylation motifs from the cellular proteome. A broad continuum of dephosphorylation was observed across the microarrayed peptide substrates for all phosphatases, suggesting a complex relationship between substrate sequence recognition and optimal activity. Further analysis of peptide dephosphorylation by hierarchical clustering indicated that DUSPs could be organized by substrate sequence motifs, and peptide-specificities by phylogenetic relationships among the catalytic domains. The most highly dephosphorylated peptides represented proteins from 29 cell-signaling pathways, greatly expanding the list of potential targets of DUSPs. These newly identified DUSP substrates will be important for examining structure-activity relationships with physiologically relevant targets.


Introduction
Tyrosine Tyr(P) phosphorylation (Tyr(P)) is a frequent and reversible protein modification that triggers essential molecular interactions, enzyme activation, changes in signaling pathways and many other key cellular events. Proteins modified during these complex biochemical cycles

Tyr(P) Peptide Microarray Assay
The annotated phosphosites-Tyr-phosphatase microarray slides (PHOS-MA-PY) were purchased from Jerini Peptide Technology (GmbH, Berlin, Germany). Each microarray consisted of three identical subarrays of 16 blocks comprised of 20 rows and 20 columns, resulting in 6218 Tyr(P) peptides printed in triplicate on each glass slide. Human sequences were represented by 5765 peptides while the remainder originated from a variety of organisms. Most of the peptides on the microarray have a length of 13 amino acids, with the Tyr(P) residue in the middle position. The peptide microarray slides were blocked with 1X Fast blocking buffer (Thermo Scientific Inc., Rockford, IL, USA). Spacers were inserted between the peptide microarray and a blank slide, and phosphatases (0.1-1.0 mg/ml) in citrate buffer (pH 6.4) were added from one corner of the slide until the space between the slides was completely filled. The slides were incubated in a humid chamber (22°C) 10-60 min, depending on the phosphatase used, the blank slide was removed and the microarray was washed (3X, 10 min) with TBS-0.1% (v/v) Tween-20. The wet slides were submerged in an anti-Tyr(P) (1:1000) antibody solution for 1 hour (22°C), washed 3X for 10 min with TBS-0.1% (v/v) Tween-20 before submerging in the Alex 635 Goat anti-mouse (1:2000) solution for 1 hour. The microarrays were then washed with TBS-0.1% (v/v) Tween-20 (3X, 10 min) and distilled water (2X, 5 min). Air-dried microarrays were scanned (635nm) using an AXON GENEPIC 4000B scanner (Molecular Devices, Sunnyvale, CA, USA). Similar instrument settings were used to scan all peptide microarray slides. Digital images of the results were analyzed with GenePix Pro 5.1 software (Molecular Devices). Background pixel counts were subtracted from triplicate spots and the results were averaged.

Microarray Data Analysis
The dephosphorylation status of each peptide on the peptide microarray was obtained by measuring the florescence intensity. Pixel values for each spot on the microarray were subtracted from background and recorded in an excel file as relative florescence units (RFUs). The florescence intensity for each peptide, presented as relative florescence units (RFUs), was calculated by averaging the florescence intensity of triplicate spots for each peptide. The reference control slide was treated with buffer only. Data for a total of 6218 annotated phosphotyrosine peptides (18654 spots per peptide microarray) were collected for the reference slides and DUSP treated slides. Unreliable data from individual peptide spots were removed from further analysis based on the following criteria: Mean RFU 635 or Mean RFU 635 Meadian RFU 635 of a spot was > 1.5 The RFUs collected from each DUSP treated phosphotyrosine peptide microarray were combined and quantile normalized using the "preprocessCore" package (http://www. bioconductor.org/packages/release/bioc/html/preprocessCore.html) in R/BioConductor. The percentage dephosphorylation of each peptide was calculated using the following equation: A subgroup of phosphotyrosine peptides (n = 916) with high fluorescent signals (the relative fluorescent signal greater than 40,000 in the reference slide) were selected for hierarchical cluster analysis. The hierarchical cluster analysis of the microarray data for substrate dephosphorylation was performed with MultiExperiment Viewer (MeV) v4.7.4 [26], using Pearson correlation and Average Linkage Clustering algorithm.

Phylogenic Analysis
The lengths of the catalytic domains of the DUSP proteins used in this study ranged from 171 amino acids long in VH1 to 380 amino acids long in Cdc25A. Minimal catalytic domain amino acid sequences of around 140 amino acids (S1 Fig) were derived from structural and sequence alignments. Phylogenetic trees were constructed by three different multiple sequence alignment methods (the Jotun Hein Method, the Clustal V method and the Clustal W method) available in the MegAlign sequence analysis software program (DNAStar Inc., Madison, WI). Multiple sequence alignments (MSAs) were constructed by using the conserved active site motifs (HCXXXXXR) for each phosphatase along with 15 flanking amino acids on both ends. CLUSTALW2 [27] was used to generate three MSAs, each using a different gap opening penalty (5, 10, and 25), with BLOSUM62 as the protein weight matrix and all other options left as default. T-Coffee Combine [28,29] was then used to generate a single alignment that had the best agreement for all of the MSAs. To eliminate poorly aligned positions and divergent regions in the combined alignment, the alignment was filtered using Gblocks [30,31] with no gap positions within the final blocks, strict flanking positions, and no small final blocks. Gblocks reported a single conserved block starting seven residues upstream of the active site and ending at the conserved arginine residue. This 15 residue region was used to reconstruct a phylogenetic tree using the maximum likelihood method implemented in the PhyML program (v3.0 aLRT) [32]. The BLOSUM62 substitution model was selected and 4 gamma-distributed rate categories to account for rate heterogeneity across sites. The gamma shape parameter was estimated directly from the data (gamma = 0.757). Tree topology and branch length were optimized for the starting tree with subtree pruning and regrafting (SPR) selected for tree improvement. Reliability for internal branches was assessed using a bootstrap method with 1000 replicates.

Sequence Motif Extraction
Consensus sequence motifs for substrates recognized by each phosphatase were generated by pLogo (http://plogo.uconn.edu/). A total of 6032 unique 13-residue peptides in the peptide microarray library were selected as the whole data set. For each analysis,~500 peptides with the highest level of dephosphorylation (!80%) from the peptide microarray were used as the foreground data set with criteria that the original peptides all have signal intensities of RFU >40,000. The background data set was obtained by subtracting the foreground sequences from the whole data set, and statistically-significant residues were calculated by the algorithm. The Tyrosine at position 7 was selected as the fixed position with frequency of 100% for generation of the substrate motif for each DUSP. The Tyr(P) residue of each 13-residue peptide was assigned as the zero position, residues on the N-terminal side of Tyr(P) were assigned from -1 to -6, and residues on the C-terminal side were assigned from +1 to +6.

Analysis of Signaling Pathways
Human proteins represented by the most active peptide substrates were used for analysis of biological interactions by the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways data sets. Of the 583 human proteins represented by peptides that exhibited >80% dephosphorylation by any DUSP, 11 did not have a KEGG identifier, while 262 had no pathway associations, resulting in a final list of 310 proteins that were used for the pathway analysis. The substrate proteins were compared to a background consisting of the remaining 1,610 microarrayed peptides that were associated with human KEGG pathways. KEGG pathway associations were filtered to include only those pathways involved in signaling, resulting in 29 pathways for substrate and 33 for background proteins. Chi-square p-values were calculated for each pathway to identify enrichment for substrate proteins, implementing a Bonferroni correction factor for a significance threshold of p<0.001515.

Recombinant DUSPs
The phosphatases used in our study were selected based on diversity of function and for potential involvement in human diseases ( Table 1). The full length or catalytic domains of variola virus VH1 [23], human DUSP1 [33], DUSP3 [14], DUSP7 [34], DUSP14 [24], DUSP22 [35], DUSP27 [25], Cdc25A [36], Cdc25B [37] and Cdc25C(unpublished data), were produced in Escherichia coli as His-MBP tagged proteins and purified by immobilized metal-affinity chromatography (IMAC) with Ni-NTA resins. His-MBP tags were proteolytically removed using TEV protease and the proteins were further purified using size exclusion chromatography. All recombinant DUSP proteins used in our studies were highly purified ( Fig 1A) and stable in solution. DUSP3 had the highest kcat/Km value by p-nitrophenyl phosphate (pNPP) assay, and thus the highest enzyme efficiency, while Cdc25A, B and C had the lowest kcat/Km values ( Table 1).

Dephosphorylation of Microarrayed Tyr(P) Peptides
We used microarrays of Tyr(P) peptides as a high-throughput method to identify substrates for each DUSP. The microarrays consisted of >6000 Tyr(P) peptides comprising known phosphorylation sites (JPT, Berlin, Germany). The 13-residue peptides were synthesized with the Tyr(P) residue in the center, flanked by six residues of each unique protein sequence, and covalently immobilized in triplicate on glass slides via the N-terminus ( Fig 1B). Peptide dephosphorylation was assessed by incubating the microarray surface with anti-Tyr(P) antibody, followed by a goat anti-mouse IgG, conjugated to Alexa-647. The experimental conditions were empirically optimized by varying the incubation time and the amount of phosphatase added to the peptide microarray slides to obtain Tyr(P) peptides dephosphorylation data that could be compared among all DUSPs. Digital images of fluorescent-signal intensities representing dephosphorylation were collected by a laser scanner and used for data analysis. Because peptide recognition by the anti-Tyr(P) antibody was potentially affected by sequence context [54], dephosphorylation data were referenced to peptide microarrays treated with buffer only (no DUSP) to compensate for any sequence-specific variability. Fig 1B presents an image of Tyr(P) dephosphorylation by each DUSP for a subset of peptides. As shown by the example of VH1 in Fig 2A, the extent of DUSP dephosphorylation varied considerably by peptide, and this pattern was unique for each DUSP. The microarray data presented a broad continuum of dephosphorylation across the microarrayed substrates for all phosphatases (Fig 2B), suggesting both positive and negative contributions of each peptide residue. Further, the distribution of microarray dephosphorylation data from high to low peptide signal intensity was the same for each phosphatase (Fig 2B), indicating equivalency for experimental conditions. Function and disease association Breast cancer, lung cancer, prostate and ovarian cancer [39][40][41][42][43] DUSP3 Acute and myeloid leukemia [45] DUSP14

Amino Acid Sequence Motifs for DUSP Substrates
To determine the sequence motif recognized by each DUSP, we used pLogo [55] to compare the residue frequency in the most dephosphorylated peptide data set and the residue frequency in the background data set in a position-specific manner. The conserved substrate motifs for each DUSP were generated by a graphical representation (pLogo) of the patterns within a multiple sequence alignment residue in which the residue heights are scaled relative to their  statistical significance [55]. Although each motif was unique, two general trends in substrate recognition were evident (Fig 3A and 3B). For the first class of substrate motifs, the negatively charged amino acid residues Asp (D) and Glu (E) dominated the overrepresented residues for Cdc25s, VH1 and DUSP22 (Fig 3A and 3B) in at least 3 positions, while neutral Gly (G) and polar Ser (S), were overrepresented residues for DUSP1, DUSP7, DUSP14, DUSP3 and DUSP27 (Fig 3A and 3B). For the second class of substrate motifs (Fig 3A and 3B), DUSP3 and DUSP27, DUSP1, DUSP7 and DUSP14 preferred non-charged residues around the Tyr(P) residue. For DUSP3 and DUSP27, negatively-charged residues were underrepresented at all positions (Fig 3A and 3B), while overall the positively charged amino acid residues Lys (K), Arg (R), and His (H) were rarely observed in any of the motifs. Further, VH1, DUSP22, DUSP3 and DUSP27 preferred Asn at position 2 and Val at position 3. We note that a report by Kohn and coworkers concluded that VHR has a preference for glutamic acid at the -1 position of the target dephosphorylation site, whereas our results showed that alanine and valine have a high frequency occurrence at the -1 position of VHR [56]. The discrepancy could be due to differences in experimental methods and substrates employed. In contrast to the known MAPK activation motif (Thr-Xaa-Tyr), a Ser residue dominated the -2 position for DUSP1, DUSP7 and DUSP14 substrate motifs (Fig 3B), perhaps suggesting that only the phosphorylated Thr residue is favored in the -2 position.

Relationship between Peptide Substrate and Cell-Signaling Pathways
We selected the most active peptide substrates (696 with >80% dephosphorylation by any DUSP) to examine associations between specific substrates and cell-signaling pathways. Approximately 53% (310) of the 583 human proteins represented by the selected Tyr(P) peptides, were mapped to 29 KEGG signaling pathways [57,58], as shown in Fig 4. Complexity of the pathway clusters varied from one protein involved in the PPAR signaling pathway to 34 proteins mapped to the PI3K-Akt pathway, and some peptides were found in more than one cluster. Each phosphatase connected to at least 25 clusters, while some pathways had few connections to the enzymes. For example, the two proteins of the RIG-I-like receptor signaling pathway cluster were only targeted by VH1, while the Notch signaling pathway cluster, containing 3 proteins, was targeted by VH1, DUSP3, DUSP22, DUSP27, and Cdc25C. The signaling pathways of PI3K-Akt, calcium, ErbB, neurotrophin, and chemokines (Fig 4; S1 Table) were significantly enriched (p <0.001515, with Bonferroni correction) by comparison to a background list of proteins representative of all peptides printed on the microarray, while MAPK, HIF-1, Insulin, Jak-STAT, and NF-κB pathways were marginally enriched (p 0.05).

Relationship between Peptide Substrates and DUSPs
Hierarchical clustering of the Tyr(P) dephosphorylation data (Fig 5A) demonstrated that the DUSPs can be organized into four primary activity clusters based on Tyr(P) peptide substrate specificity: (a) VH1 and DUSP22; (b) DUSP1, DUSP7, DUSP14, DUSP3 and DUSP27, in which DUSP3 and DUSP27 are closely clustered; (c) DUSP1, DUSP7 and DUSP14; (d) and a fourth cluster consisting of Cdc25A, Cdc25B and Cdc25C. While each enzyme displayed a unique pattern of substrate specificity, the clustering analysis suggested that DUSPs could be grouped by shared substrates. We first assessed the phylogenetic profiles of 10 DUSPs using the amino acids of the catalytic domain. Phylogenetic trees of the 10 DUSPs were constructed based on multiple sequence alignment methods (Fig 5B). Clusters (b), (c), and (d) in the phylogenetic dendrogram presented in Fig 5 were almost identical to the clusters in the hierarchical clusters (Fig 5A). VH1 and DUSP22 are closely related by phylogeny but did not cluster into the same group (Fig 5B). Assuming that active site residues are the most relevant component   were apparent for Cdc25A-C, as well as VH1 and DUSP22. Collectively, these results suggested a potential relationship between similarities of catalytic sites and substrate recognition motifs. Further analysis of DUSP surface features suggested possible explanations for the diversity in Tyr(P) peptide recognition. We noted similar peptide substrate motifs for VH1 and the Cdc25s, with a preponderance of acidic residues (Fig 3), suggesting an important role for negative electrostatic potential in substrate docking. Curiously, DUSP14, with the most negatively charged surface surrounding the catalytic site, preferred substrates comprised of neutral or slightly polar residues (Fig 3). The protein structures of the Cdc25A, Cdc25B and Cdc25C catalytic domains are very similar to each other, but most distant from the other DUSPs examined in our study (S2 Table). The molecular structures of DUSPs representative of the four substrate clusters (Fig 5A: DUSP3, DUSP14, DUSP22 and Cdc25B) were further examined for sequence identity, the root-mean-square deviation (RMSD) of atomic position and the Cα-alignment (Q score) (S1 Fig and S2 Table). While the DUSPs we examined have very similar or identical catalytic site sequence motifs (Table 1), the 3-dimensional structures fall into two general folds ( Fig  6A). The common alpha helix that is perpendicular to the surface of the catalytic pocket (center of box in Fig 6A) aligned well with the other DUSP structures (DUSP3, DUSP14 and DUSP22). However, to properly align the Cdc25B catalytic site, the orientation of the surface model was slightly shifted in perspective compared to the other structures shown in Fig 6A. In another feature, the electrostatic potential of the surfaces surrounding the catalytic site are distinct for each of the modeled DUSPs (Fig 6B), with several commonalities. All of the DUSP surfaces harbor a positively-charged surface that is near the Tyr(P)-binding pocket. For DUSP3, one surface adjacent to the catalytic site presents a positive electrostatic potential that is flanked on the opposite side of the catalytic site by a large negatively-charged patch. The DUSP14 surface nearest the catalytic site is mostly hydrophobic, while the remaining areas are positively charged. The distribution of surface electrostatic potential for DUSP22 is very similar to DUSP3, with a positively charged region on one side of the catalytic site and a mixed negatively charged or neutral region on the adjacent side. In the case of Cdc25B, a narrow positively-charged area surrounds the Tyr(P) pocket.

Discussion
We identified optimal substrate sequence motifs for dephosphorylation of Tyr(P) residues by enzymes that are representative of four distinct categories of DUSPs. The substrate motifs identified in our study will be important for examining the structural relationships that drive interactions with cellular targets. In considering the diversity of the most active substrates represented by the Tyr(P) peptide substrates, the enzymatic activities of the DUSPs were directed towards at least 29 signaling pathways, and most significantly for PI3K-Akt, calcium, ErbB, neurotrophin, and chemokine pathways. Although distinct sequence preferences were evident for each DUSP, a high degree of substrate promiscuity was also apparent. It is possible that the DUSPs we examined may interact with multiple substrates during normal or pathological cellular processes, as recently postulated for PTP1B [59]. Substrate-trapping mutants of DUSPs also engage in stable interactions with many native cellular proteins (our unpublished observations). Further, a non-catalytic phosphate-binding pocket that is observed in many PTPs [60] may influence substrate interactions. We note that our results do not directly address catalytic activity directed towards Ser(P) or Thr(P) residues, and that protein-protein representative of similarity within phosphatase sequences involved in substrate recognition. A maximum likelihood tree is depicted with bootstrap values (out of 1000 replicates) shown in red. doi:10.1371/journal.pone.0134984.g005 interactions occurring outside of the active site also guide the catalytic domain to the correct intracellular substrate.
General conclusions regarding enzyme-ligand interactions are possible based on our results, though further study will be required to confirm that the motifs we identified represent legitimate biological substrates. The most highly dephosphorylated peptides represented proteins from 29 cell-signaling pathways, greatly expanding the list of potential physiological targets. The DUSPs examined in our study fell into four primary activity clusters based on substrate specificity. We examined relationships among DUSPs catalytic activities, catalytic domain sequences and conserved catalytic sites. Based on the relative agreement between the phylogenetic dendrograms, our results suggested a potential relationship between similarities of catalytic sites and substrate recognition motifs.
The DUSPs use a common dephosphorylation mechanism [4] consisting of a thiophosphoryl intermediate that is formed by a thiolate nucleophilic attack of the catalytic site Cys anion directed towards the phosphoryl group of the peptidyl Tyr(P), assisted by an invariant Asp that is located in the P loop in all PTPs except the Cdc25s. Certain features of the peptide motifs we describe and DUSP surfaces reported by others provide clues regarding possible mechanisms for substrate recognition. The predominance of acidic residues flanking the Tyr (P) within the peptide motifs implies that negative surface electrostatic potential is important for substrate docking, while the positive electrostatic surfaces near the DUSP catalytic site may complement the incoming phosphate group. In a similar manner for single-specificity PTPs, negatively-charged residues were favored while positively charged residues were unfavorable for peptide sequence selection [61]. Yet, our results also suggest that DUSPs may be less selective than previously considered. It is possible that the shallow catalytic pockets and relatively flat protein surface features that are characteristic of most DUSPs drives the promiscuous phosphatase activity noted in our study. For example, catalytic domains of Cdc25A-C are extremely shallow and open [36], with no auxiliary loop extending over the active site to facilitate substrate dephosphorylation, and the surface surrounding the catalytic pocket of the poxvirus VH1 is very flat [23].
We considered the potential contribution of 'dual-specificity' to our results, as the microarrayed peptide library that we employed contained only Tyr(P) substrates. The Thr(P) binding site was identified in the co-crystal structure of DUSP3 in complex with a biphosphorylated p38 peptide [15], providing direct evidence for dual-specificity substrate docking. The Thr(P) pocket of DUSP3 is partially formed by the positively charged Arg158 residue. The DUSP22 residue Arg122 also forms a positively charged pocket that was postulated to play the same role as Arg158 in DUSP3 [35]. In a previous report, Cdc25s dephosphorylated a Cdk2 peptide containing Thr14(P) and Tyr15(P) residues more efficiently than the same peptide monophosphorylated at either position [62]. The preference for negatively-charged residues at the -1 or +1 position relative to Tyr(P) in the conserved motifs for Cdc25s may mimic the negatively charged Thr14(P) residue of Cdk2 protein. It is possible that the acidic or hydroxyl side chain present in the +2 position relative to Tyr(P) in most but not all of the peptide substrate motifs (Fig 3) substituted for Thr/Ser(P) in substrate recognition by the DUSPs we examined. In addition to the negatively charged amino acid residues, Ser, Thr and Tyr were present in select DUSP peptide substrate motifs. One explanation for this observation is that these residues may bind to the secondary pocket for Thr(P)/Ser(P) hydrolysis, or stabilize the peptide-phosphatase interactions to facilitate dephosphorylation of the Tyr(P) residue, as seen in the Thr(P) reside of p38 peptide binding to the Arg158 pocket on DUSP3 [15]. Combining the newly identified DUSP substrates from our study with optimal Thr(P) or Ser(P) motifs will be important for clarifying these structure-activity relationships and for the design of chemical probes to explore potential biological roles.