Post-translational modifications of Drosophila melanogaster HOX protein, Sex combs reduced

Homeotic selector (HOX) transcription factors (TFs) regulate gene expression that determines the identity of Drosophila segments along the anterior-posterior (A-P) axis. The current challenge with HOX proteins is understanding how they achieve their functional specificity while sharing a highly conserved homeodomain (HD) that recognize the same DNA binding sites. One mechanism proposed to regulate HOX activity is differential post-translational modification (PTM). As a first step in investigating this hypothesis, the sites of PTM on a Sex combs reduced protein fused to a triple tag (SCRTT) extracted from developing embryos were identified by Tandem Mass Spectrometry (MS/MS). The PTMs identified include phosphorylation at S185, S201, T315, S316, T317 and T324, acetylation at K218, S223, S227, K309, K434 and K439, formylation at K218, K309, K325, K341, K369, K434 and K439, methylation at S19, S166, K168 and T364, carboxylation at D108, K298, W307, K309, E323, K325 and K369, and hydroxylation at P22, Y87, P107, D108, D111, P269, P306, R310, N321, K325, Y334, R366, P392 and Y398. Of the 44 modifications, 18 map to functionally important regions of SCR. Besides a highly conserved DNA-binding HD, HOX proteins also have functionally important, evolutionarily conserved small motifs, which may be Short Linear Motifs (SLiMs). SLiMs are proposed to be preferential sites of phosphorylation. Although 6 of 7 phosphosites map to regions of predicted SLiMs, we find no support for the hypothesis that the individual S, T and Y residues of predicted SLiMs are phosphorylated more frequently than S, T and Y residues outside of predicted SLiMs.


Introduction
The identity of body segments along the Anterior-Posterior (A-P) axis of Bilaterians is determined by a set of developmental control genes called Homeotic selector (Hox) genes (reviewed in Akam, 1998;Lewis, 1978). These genes encode transcription factors (TFs) that regulate expression of target genes by binding to DNA-binding sites with a 60 amino acid DNAbinding homeodomain (HD) (Gehring et al., 1994). That HOX proteins determine distinct segmental identities and regulate distinct patterns of gene expression while recognizing similar DNA binding sites is a major paradox. Interactions with the cofactor Extradenticle (EXD) is one mechanism proposed to mediate the functional specificity of HOX proteins. In addition to HOX cofactor protein interactions, PTMs are also proposed to have a role in the regulation of functional specificity (reviewed in Primon et al., 2019;Draime et al., 2018;Sivanantharajah & Percival-Smith, 2015). Here the PTMs of a HOX protein, Sex combs reduced (SCR) are mapped as a first step towards understanding this regulation of HOX protein activity.
Phosphorylation is a major regulatory mechanism used in cellular signaling pathways that often terminate in the regulation of TF activity and subsequent regulation of gene expression (Mylin et al., 1989;Hunter & Karin, 1992;Ardito et al., 2017). HOX are phosphoproteins (Berry & Gehring, 2000;Jaffe et al., 1997;Gavis & Hogness, 1991;Stultz et al., 2006;Krause et al., 1988;Krause & Gehring, 1989;Bourbon et al., 1995;Driever & Nüsslein-Volhard, 1989;Gay et al., 1988;Ronchi et al., 1993;Dong et al., 1998;Janody et al., 2000). PP2A-B' is proposed to activate SCR by dephosphorylating residues in the N-terminal arm of the SCR HD that when phosphorylated inhibit interaction of the SCR HD with DNA. Indeed, peptides with N-terminal arm sequence are phosphorylated by cAMP-dependent protein kinase A (PKA) and dephosphorylated by Serine-threonine protein phosphatase 2A (PP2A-B') in vitro (Berry & Gehring, 2000). However, a null PP2A-B' allele does not affect SCR activity suggesting that dephosphorylation by PP2A-B' plays no role in regulating SCR activity (Moazzen et al., 2009). A drawback of the methodologies employed to study phosphorylation of SCR and other Drosophila HOX and HD-containing proteins so far is they provide no direct and definitive information on which amino acid residue is phosphorylated in the developing Drosophila embryo (Berry & Gehring, 2000;Jaffe et al., 1997;Gavis & Hogness, 1991;Stultz et al., 2006;Krause et al., 1988;Krause & Gehring, 1989;Bourbon et al., 1995;Driever & Nüsslein-Volhard, 1989;Gay et al., 1988;Ronchi et al., 1993;Dong et al., 1998;Janody et al., 2000). We have used Tandem Mass Spectrometry (MS/MS) on SCR extracted from developing embryos to provide data about a possible repertoire of PTMs (Johnson & Eyers, 2010). The block diagram is drawn to scale. The functional regions of SCR are color-coded. The octapeptide motif is labeled in blue, LASCY motif in orange, DYTQL motif in dark green, NEAGS motif in black, YPWM motif in yellow, NANGE motif in grey, HD in red, KMAS motif in light green and CTD in purple.

Ectopic expression of SCR and SCRTT and preparation of first instar cuticles
For ectopic expression of SCR, virgin female flies of the genotype, y w; P{UAS-Scr, w + } (Bloomington stock # 7302) were crossed with the GAL4 driver males of the genotype, y w; P{Armadillo-Gal4, w + } (Bloomington stock # 1560) and progeny were collected (Brand & Perrimon, 1993). For ectopic expression of SCRTT protein from the heat-shock promoter ScrTT fusion gene, a heat-shock was administered at 5 hours AEL at 37.5°C for 30 minutes. First instar cuticles were prepared as described (Wieschaus & Nüsslein-Volhard, 1986) and imaged with darkfield optics on a Leica ® Leitz™ DMRBE microscope.
Ectopic expression of SCRTT from the heat-shock and UAS promoters To ectopically express SCRTT using the GAL4-UAS system (Brand & Perrimon, 1993), adult virgin female flies of the genotype, y w; P{UAS-ScrTT, w + } were crossed with the GAL4 driver males of the genotype, y w; P{Armadillo-Gal4, w + } (Bloomington stock # 1560) and the progeny expressed SCRTT ubiquitously. To induce expression of SCRTT from the heat-shock promoter, D. melanogaster embryos at 0-16 hours AEL were collected from the apple juice plates on nylon mesh screens of a filter basket and were heat-shocked for 30 minutes at 37.5°C by immersion of the filter basket in a circulating water bath.
Affinity purification of SCRTT protein from embryos SCRTT protein was purified from embryo extracts using subcellular fractionation followed by metal affinity chromatography in denaturing conditions (Loughran & Walls, 2011;Haneskog, 2006). 3g of heat-shocked embryos was homogenized in 15ml of lysis buffer (15mM HEPES pH 7.6, 10mM KCl, 5mM MgCl2, 2mM EDTA, 350mM sucrose, 0.032% 2-mercaptoethanol, with protease inhibitors: 0.2mM phenylmethanesulfonylfluoride (PMSF), 1.3mM benzamidine and 0.3mM Aprotinin) using a 40ml Dounce Homogenizer. The lysate was centrifuged in a Corex tube at 10,000 rpm for 15 minutes in a Sorval SS-34 rotor, and the supernatant was discarded. The white top layer of the pellet (pellet 1), leaving the dark colored debris behind, was carefully resuspended in a resuspension buffer 1 (15mM HEPES pH 7.6, 10mM KCl, 0.1mM EDTA, 350mM sucrose, 0.006% 2-mercaptoethanol, with protease inhibitors: 0.2mM phenylmethanesulfonylfluoride (PMSF), 1.3mM benzamidine and 0.3mM Aprotinin) and was centrifuged at 10,000 rpm for 15 minutes in a Sorval SS-34 rotor. The pellet (pellet 2) was resuspended in resuspension buffer 2 (15mM HEPES pH 7.6, 10mM KCl, 350mM sucrose, with protease inhibitors: 0.2mM phenylmethanesulfonylfluoride (PMSF), 1.3mM benzamidine and 0.3mM Aprotinin) and was centrifuged at 10,000 rpm for 15 minutes in a Sorval SS-34 rotor. The pellet (pellet 3) was resuspended in a nuclear lysis buffer (50mM NaH2PO4, pH 7.5, 300mM NaCl, 20mM imidazole, 1% NP-40, with protease inhibitors: 0.2mM phenylmethanesulfonylfluoride (PMSF), 1.3mM benzamidine and 0.3mM Aprotinin) and was centrifuged at 12,000 rpm for 10 minutes in a Sorval SS-34 rotor. All preceding steps were performed at 0-5ºC. The nuclear extract (NE) was mixed with solid urea to a final concentration of 8M, and the mixture was gently rocked at room temperature until the urea dissolved. The denatured nuclear extract (NE+Urea) was mixed with 250µl of Ni-NTA sepharose beads (IBA Lifesciences) that had been equilibrated with the denaturing nuclear lysis buffer, and gently rocked for 15 minutes at room temperature. The beads were packed in a column by gravity flow and the flow-through was reapplied to the column. The beads in the column were washed twice with the denaturing nuclear lysis buffer and then, washed twice with a buffer containing 50mM NaH2PO4, 300mM NaCl pH 7.5. The beads were stored at -80°C.

Western Blot analysis
The proteins in an SDS-polyacrylamide gel were transferred onto an Immobilon ® -P PVDF transfer membrane (Millipore Sigma) by electroblotting at 250mA for two hours in ice cold transfer buffer (25mM Tris, 192mM glycine and 10% methanol). The blots were blocked at room temperature for one hour in Blotto (PBT: 10% PBS and 0.1% Tween-20, and 3% skim milk). Anti-FLAG M2 monoclonal antibody (Sigma-Aldrich) at a dilution of 60,000-fold in Blotto was incubated with the blot for one hour at room temperature. After washes with PBT, horseradish peroxidase (HRP)-conjugated goat anti-mouse antibody (ThermoFisher Sci.) at a dilution of 3,000-fold in Blotto was added and incubated for one hour at room temperature. After washes with PBT, HRP was detected using SuperSignal™ West Femto Maximum Sensitivity Chemiluminescent Substrate (ThermoFisher Sci.). Digital images were recorded using a ChemiDoc™ Imaging System (Bio-Rad). For some experiments, the membrane was stripped for one hour at room temperature using Restore ™ Western Blot Stripping Buffer (ThermoFisher Sci.) to remove the anti-FLAG antibody and was blocked at room temperature for one hour in Blotto followed by incubation with anti-β-tubulin monoclonal antibody (E7 concentrated from Developmental Studies Hybridoma Bank, University of Iowa, Iowa City, IA) at a dilution of 1,500-fold in PBT.

Sample preparation for MS/MS
The Ni-NTA bead slurry (25µl) from the protein purification was mixed with equal volume of 2xSDS buffer (100mM Tris-HCl pH 6.8, 200mM 1,4-dithiothreitol (DTT), 4% SDS, 20% glycerol, 1% 2-mercaptoethanol, ~1 mg/ml bromophenol blue; Sambrook et al., 1989) and was heated to 90°C for 10 minutes. The 50µl sample was loaded onto a 1.5mm thick SDS-Polyacrylamide gel (11% separating and 5% stacking gel) for size separation of proteins. The gel was stained with 1 mg/ml Coomasie blue (Coomasie Brilliant Blue™ R-250 from ThermoFisher Sci.). The destained gel was stored in 5% glacial acetic acid at 4°C. At the Functional Proteomics Facility, Western University, London, Ontario, Canada, 10 spots were picked from the gel using an Ettan ® SpotPicker™ and were in-gel digested with either trypsin (Promega), chymotrypsin (Sigma-Aldrich) or thermolysin (Promega), and the peptides subsequently lyophilized. The peptides were analyzed with a Thermo Scientific Orbitrap Elite mass spectrometer, which uses the nano LC-ESI-Orbitrap-MS/MS technique, at the Biological Mass Spectrometry Laboratory, Western University, London, Ontario, Canada for protein identification and characterization of post-translational modifications.
Mass spectrometry data analysis LC-ESI-Orbitrap-MS/MS data was analyzed at the Biological Mass Spectrometry Laboratory, Western University, London, Ontario, Canada. PEAKS™ DB software versions 7, 7.5 or 8 (Bioinformatics Solutions Inc.; Zhang et al., 2012) were used to perform de novo sequencing and subsequent database search. PEAKS™ PTM was used to identify post-translational modifications. PEAKS™ DB uses a peptide score, which measures the quality of the peptidespectrum match and separates the true and false identifications. Peptide score is given as -10log10P, where P refers to P-value. A high peptide score and a low P-value are associated with the confidence of the peptide match. The false discovery rate (FDR) was set at 1% which establishes a peptide cut-off score. A peptide must meet the cut-off score in order to be identified by PEAKS™ DB. For our analysis, a modified peptide was associated with higher confidence if the peptide score was higher than the cut-off score by 8, which corresponds to a lower P-value. Minimal ion intensity, which is the relative intensities of position-determining fragment ions in a MS 2 spectrum was set to 5%. Coverage is given in the analysis as percent coverage. Average Depth (AD) is the addition of the lengths of all chemically distinct peptides identified divided by the length of the protein. Since, the proteases used often generate fragments too large or too small for analysis, the Average Depth of regions covered (ADorc) is calculated, which is Average Depth divided by proportion of the protein covered. To distinguish between the biologically relevant PTMs and artefactual modifications that might have arisen due to chemical handling, a manual investigation of the modifications was performed.
Phosphopeptide enrichment and C18 desalting of non-phosphopeptides Phosphopeptide enrichment of trypsinized α-casein or SCRTT used the EasyPhos protocol employing TiO2 beads (Humphrey et al., 2015). The TiO2 flow-through containing potential non-phosphopeptides were desalted using a C18-StageTip (Rappsilber et al., 2003) prior to MS/MS analysis. The C18 StageTip was solvated thrice with 200µl of 80% acetonitrile, 0.2% formic acid and 19.8% water followed by centrifugation at 3000g for 2 mins at room temperature or until no liquid remained in the tip. The C18 StageTip was equilibrated thrice with 200µl of 2% acetonitrile and 0.2% formic acid followed by centrifugation at 3,000g for 2 mins at room temperature. The TiO2 flow-through was reduced in a SpeedVac to 100µl and adjusted to have a concentration of 0.2% formic acid. The sample was loaded onto the C18 StageTip and was centrifuged at 500g at room temperature until no liquid remained in the tip. The tip was then washed thrice with 200µl of aqueous buffer (2% acetonitrile, 0.2% formic acid and 97.8% water) followed by centrifugation at 500g at room temperature until no liquid remained in the tip. The peptides were eluted with 100µl of elution buffer (80% acetonitrile, 0.2% formic acid and 19.8% water) followed by centrifugation at 500g at room temperature until no liquid remained in the tip. The eluate was concentrated under vacuum using a SpeedVac to a volume of approximately 18µl and formic acid was added at a final concentration of 0.25-0.5%. The sample was analyzed by LC-MS/MS.
Bioinformatic analysis of proteomic data D. melanogaster SCR protein sequence (NCBI accession number in Table S8) was submitted to the ELM database (Dinkel et al., 2016) to retrieve predicted nuclear and cytoplasmic short linear motif (SLiM) sequences as HOX transcription factors interact with nuclear and cytoplasmic components (Merabet & Dard, 2014;Wiellette et al., 1999). In addition, SLiMs with any amino acids from known ordered regions of SCR were excluded from the analysis Joshi et al., 2007). To determine whether a SLiM was conserved, SCR orthologous protein sequences of various protostome and deuterostome species belonging to different phyla, were retrieved from NCBI or ORCAE (only for T. urticae; Sterck et al., 2012) database (accession numbers in Table S8) and a multiple sequence alignment was performed using the tools, MAFFT version 7 (Katoh et al., 2017) and Clustal Omega (Sievers et al., 2011). Each SLiM of SCR was manually checked for conservation across species ( Figure S9). A SLiM was considered to be conserved only if they aligned perfectly in both MAFTT and Clustal Omega irrespective of the maximum length of the SLiM. SLiMs less than five amino acids long were not considered as a conserved SLiM unless conserved beyond Diptera. The minimum length of a SLiM conserved beyond Diptera was 4 amino acids.

Statistical Analysis
To determine the significance of the biased distribution of serine (S), threonine (T) and tyrosine (Y) in HOX proteins to SLiMs vs. non-SLiMs and the biased phosphate distribution in SCR SLiMs vs. non-SLiMs, Fisher's Exact Test was employed (Fisher, 1922).

Expression of SCRTT protein
To map PTMs of SCR expressed during embryogenesis requires an initial concentrated source of protein that can be affinity purified. The CDS (expressing SCR isoform A (417 aa); FlyBase ID FBpp0081163) of Scr mRNA was fused in frame to the triple tag (TT) encoding 3X FLAG, Strep II and 6X His tags (Tiefenbach et al., 2010;Percival-Smith et al., 2013) and cloned behind the UAS promoter of pUAST (Percival-Smith et al., 2013), the heat-shock promoter (hsp) of pCaSpeR (Thummel & Pirrotta, 1992) and the T7 promoter of pET-3a (Studier & Moffatt, 1986;Studier et al., 1990). Two major systems for ectopic expression of proteins in Drosophila are the heat-shock inducible promoter and the GAL4-UAS binary system. The expression of SCRTT using these two ectopic expression systems were compared ( Figure 2). The heat-shock promoter resulted in higher levels of accumulation of SCRTT ( Figure 2). The fold increase of SCRTT expression of heat-shock relative to UAS was too great to be accurately quantified. The relative molecular mass (Mr) of SCRTT protein is calculated to be 49.8 (Artimo et al., 2012). However, on a Western Blot, the SCRTT protein expressed during embryogenesis ran with a higher Mr of 62 ( Figure 2).

SCRTT protein is biologically active
To assess whether SCRTT expressed from the heat-shock promoter was biologically active, the first instar larval cuticular phenotype of heat-shocked embryos expressing SCRTT was compared with the cuticles that result from expression of untagged SCR protein in all cells of the embryo using the GAL4-UAS system (Brand & Perrimon, 1993). Both SCR and SCRTT ectopic expression induced ectopic T1 beards in T2 and T3, indicating that the triple tag does not interfere with the biological activity of SCR in vivo ( Figure 3A & B, Gibson et al., 1990;Percival-Smith et al., 2013;Zhao et al., 1993).  The anterior half of the larva is shown. The untagged SCR protein expressed with the GAL4-UAS system using a ubiquitous armadillo-GAL4 is shown in panel A; whereas, the SCRTT protein expressed from a heat-shock promoter is shown in panel B. T1, T2 and T3 refer to first, second and third thoracic segments. A1, A2 and A3 refer to first, second and third abdominal segments. A & B. Ectopic expression of SCR and SCRTT, respectively (T2 and T3 beards marked with arrows). C. Control wild-type (WT) first instar larval cuticle.
Analytical workflow for affinity purification, digestion and mapping of PTMs in embryonically expressed SCRTT For affinity purification of SCRTT, 3g of heat-shocked embryos containing the hspScrTT fusion gene were collected between 0 and 16h AEL and lysed. The nuclei were collected and washed, and the proteins of the nuclear extract were denatured and SCRTT was affinity purified by Ni-NTA chromatography (Hochuli et al., 1987;Hochuli et al., 1988). The purification of SCRTT was monitored by Western Blot analysis and shows concentration of SCRTT on the Ni-NTA beads ( Figure 4B). An SDS gel stained for total protein identified a band of the correct Mr for SCRTT from protein extracted from the Ni-NTA beads ( Figure 4C). To determine whether this purification provided the amount of SCRTT required for MS/MS, a sample of the Ni beads containing purified SCRTT was run alongside a sample of 3500ng of SCRTT purified from bacteria ( Figure 4D). The signal for the SCRTT purified from Drosophila embryos was 2.4-fold less suggesting the band contained about 1500ng of SCRTT. For the MS/MS analysis, approximately 10µg of SCRTT extracted from Drosophila embryos was analyzed per sample.
Eight samples were analyzed with MS/MS of which four were digested with trypsin, two with chymotrypsin and two with thermolysin. Figure S1 shows the distribution of chemically distinct peptides over the primary sequence of SCRTT for each sample. Trypsin, chymotrypsin and thermolysin were chosen for digestion of SCRTT based on the analysis of predicted peptide generated by these enzymes (Artimo et al., 2012). The coverage and average depth of region covered (ADorc) for each sample analyzed, and the combined coverage and ADorc of various combinations of samples were determined (Table 1). The final coverage was 96% of the primary sequence of SCRTT.  Evidence for the PTMs of SCRTT Each spectrum obtained by LC-MS/MS was interrogated by PEAKS DB search followed by identification of PTMs using PEAKS PTM algorithm. Embryonic SCRTT is post-translationally modified ( Figure S1). The modifications that were the result of a biochemical process and not a potential by-product of sample preparation were interrogated and characterized in more detail (Table 2 and Figure S1). These 44 modifications are phosphorylation, acetylation, formylation, methylation, carboxylation or hydroxylation ( Figures S3-8). For all modifications interrogated, six diagnostic criteria were assessed: first that the peptide had a mass shift indicative of the modification; second whether in the MS 2 spectra there were b and/or y ions that supported modification of a specific amino acid residue; third whether other MS 2 spectra were identified for peptides with a particular modification; fourth whether overlapping or differently modified peptides were identified with the same modification; fifth whether a modification was identified in multiple independent samples used for MS/MS analysis (Table 2) and sixth whether the difference of the modified peptide score and the cut-off score was greater than 8. For phosphorylation, an additional diagnostic criteria was assessed: whether b and/or y ions with a neutral loss of phosphoric acid (98 Da) were present in the MS 2 spectra. a Total number of MS 2 spectra reporting the modification as identified by PEAKS and subsequently filtered for those with evidence from the analysis of the 8 samples (Table 1) b Total number of samples the modification was observed in (Table 1) c The peptide score differences greater than 8 are in bold. A peptide was associated with higher confidence if the peptide score was higher than the cut-off score by 8, which corresponds to a lower P-value. d S185 phosphosite is not labelled in Figure S1A but was identified in the analysis of tryptic peptides (A2 Phosphorylation of SCRTT is substoichiometric The percentage of phosphorylated peptides is low (Table 2) indicating that phosphorylation of SCRTT may be substoichiometric. To determine whether the lack of phosphate detection was due to the instrument used for the analysis, a heavily phosphorylated protein, α-casein (Larsen et al., 2005) was analyzed. 12 phosphosites were detected in α-casein (Table S2; Figure S2). Multiple phosphosites map between 61-70 of α-casein, and of the 64 peptides identified for this region, 61 were phosphorylated and 24 of these peptides were phosphorylated at two amino acid residues, which is a high percentage of modified peptides indicating stoichiometric levels of phosphorylation of α-casein ( Figure 5). This suggests that phosphorylation is stable during MS/MS analysis.
TiO2 beads were used to enrich for phosphopeptides from trypsinized α-casein and SCRTT samples (Humphrey et al., 2015).  Figure S2). However, only 54 phosphopeptides were common for both treatments. 104 phosphopeptides previously identified without TiO2 treatment could not be identified post-TiO2 treatment which indicates loss of phosphopeptides (Table S3). Although TiO2 enriches for phosphopeptides, the yield is low and no phosphopeptides were detected upon TiO2 enrichment of trypsinized SCRTT.  Figure S2A) and each blue line underneath the primary protein sequence represents a chemically distinct peptide identified by MS/MS analysis. The peptides are heavily modified, and the modifications are indicated by letters or symbols on the blue lines. On the right is the legend for all modifications shown in the figure.
Are Short Linear Motifs (SLiMs) in SCR favored sites of phosphorylation?
Outside the HD, HOX proteins are highly disordered proteins (reviewed in Merabet & Dard, 2014). SLiMs present in disordered protein regions are proposed to be preferential sites of phosphorylation (Diella et al., 2008;Dinkel et al., 2012;Sivanantharajah & Percival-Smith 2015). To test this with SCRTT, the Eukaryotic Linear Motif (ELM) resource (Dinkel et al., 2016) was screened for predicted SLiMs. Some SLiMs also correspond with regions of SCR conservation and are referred to as 'conserved SLiMs'. In SCR, 66% of the primary sequence was SLiM sequence and 35% were conserved SLiM sequence (Table 3; Figure 6). 6 out of 7 phosphosites were in a SLiM region and 1 of 7 phosphosites was in a conserved SLiM ( Figure  6). There is an enrichment of phosphosites to SLiMs. Although this may suggest preferential phosphorylation of SLiMs, we also analyzed whether the amino acid residues that accept phosphates are more frequently phosphorylated in predicted SLiMs than outside of SLiMs; an additional expectation for SLiMs being preferential sites of phosphorylation. There was no significant increase in the frequency of phosphorylation of residues in SLiM versus non-SLiM regions, and a significant decrease in the frequency of phosphorylation of residues in conserved SLiMs (Table 3). In addition, 9 out of 81 SLiMs and 5 out of 51 conserved SLiMs of SCR were phosphorylated (Tables S6 & S7) indicating that about 10% of predicted SLiMs are bona fide sites of SCR phosphorylation. This suggests that a minority of SCR predicted SLiMs were phosphorylated which also does not support the hypothesis that SLiMs are preferential sites of phosphorylation. The reason that 6 of the 7 phosphosites are in predicted SLiMs is due to the bias of S, T and Y residues to SLiMs; the percentage of S, T and Y is significantly higher in the SLiMs and conserved SLiMs than in non-SLiM portions of SCR (Table 3).

PTMs of SCRTT
PTMs are one mechanism proposed for the functional specificity of HOX proteins, and mapping PTMs of HOX proteins is a first step in testing this proposal. Bottom-up MS/MS analysis of SCRTT purified from developing D. melanogaster embryos identified many amino acid residues that were covalently modified (Figure 7). The final analysis of PTMs did not include modifications that could be due to sample preparation; however, some of these uninterrogated modifications may be a result of a biological process that regulates SCR activity. For example, deamidation of N321, which is the second N of the NANGE motif, may have a role in DNA binding (O'Connell et al., 2015). The potentially biologically relevant PTMs will be discussed in relation to conserved protein domains and sequence motifs, SCR function, distribution within predicted SLiMs and the structure of an SCR-EXD-DNA complex. Of the conserved domains/motifs of SCR important for SCR activity (Figure 1; Sivanantharajah & Percival-Smith, 2009;Percival-Smith et al., 2013;Sivanantharajah & Percival-Smith, 2014;Sivanantharajah, 2013), all are post-translationally modified with the exception of the octapeptide and KMAS motifs.

Phosphorylation of SCRTT
A clustered set of phosphorylations on the amino acid residues, T315, S316, T317 and T324 flank the NANGE motif (Figure 7). The NANGE motif is important for the suppression of ectopic proboscis formation suggesting that phosphorylation may regulate the transition of SCR activity between either determining T1 identity or determining labial identity (Percival-Smith et al., 2013). The phosphosites map to a region of SCR not ordered in the SCR-EXD complex binding to fkh DNA (Figure 7). A few amino acid residues of the linker region (15 residues between YPWM motif and the HD) and the N-terminal of the HD (residues 3-9 of 60) of SCR interact stably with the minor groove of fkh DNA (Joshi et al., 2007;Rohs et al., 2009a;Rohs et al., 2009b;Rohs et al., 2010;Abe et al., 2015). Although the amino acid residues of SCR involved in stable minor groove interactions were not found to be modified, the residues, T315, S316 and T317 that are part of the linker between the YPWM motif and HD, and T324 which is the first amino acid residue of the HD, may interact with DNA transiently. Phosphorylation adds negative charge to amino acid residues, and therefore, phosphorylation of the linker region and N-terminal arm of HD of SCR might interfere with transient minor groove interactions.
In the model for regulation of SCR activity proposed by Berry & Gehring, 2000, phosphorylation of the 6 th and 7 th amino acid residues of the HD, T329 and S330 inhibits SCR DNA binding and activity. cAMP-dependent protein kinase A phosphorylates these residues in vitro, and this negative regulation of SCR activity is proposed to be reversed by phosphatase PP2A-B' that removes the phosphates on these residues in vitro. However, loss of PP2A-B' activity had no effect on SCR activity suggesting that PP2A-B' is not involved in regulation of SCR activity (Moazzen et al., 2009). Further we have not detected phosphorylation of T329 and S330 suggesting that their phosphorylation by cAMP-dependent protein kinase A may also be an in Figure 8. The structure of SCR-EXD-DNA complex determined by crystallography. SCR is shown in pink and EXD in blue. The two strands of fkh regulatory DNA is shown in brown and green. The modified amino acids of SCR along with their side chains are shown in yellow. Acacetylation, Carcarboxylation, Foformylation, Hydhydroxylation and Memethylation. The structure coordinates with accession code 2R5Z (fkh250) were retrieved from RCSB Protein Data Bank (Joshi et al., 2007). Cn3D 4.3.1 (NCBI) was used to annotate the 3-D structure.
PTMs of SCR residues found in the structure of the SCR-EXD-DNA complex The structure of SCR-EXD bound to fkh regulatory DNA encompasses the evolutionarily conserved functional motifs/domains, YPWM, NANGE and HD (Figure 8). The Bilateranspecific YPWM motif of SCR and UBX binds to a hydrophobic pocket on the surface of EXD HD (Passner et al., 1999;Joshi et al., 2007). The MS/MS analysis identified hydroxylation at P306 and carboxylation at W307 of the YPWM motif of SCR. These modifications render the YPWM motif hydrophilic and may interfere with the binding of YPWM to the hydrophobic pocket of EXD HD. This might be a mechanism of regulation of SCR activity as SCR-EXD interaction is essential for activating the target, fkh gene which is required for salivary gland development .
The linker region of SCR interacts with the minor groove of fkh DNA. Narrowing of the DNA minor groove increases the negative electrostatic potential of the groove and proteins exploit this charged state of the groove by inserting a positively charged amino acid residue, thereby, making the interaction more stable (Joshi et al., 2007;Rohs et al., 2009b). Although K309 and R310 residues of SCR do not directly interact with the minor groove of DNA, they render the region of the protein positively charged which aids the neighboring H312 residue in making a strong contact with the DNA minor groove (Joshi et al., 2007). Carboxylation at K309 and hydroxylation at R310 adds negative charges to this region of the protein which might be a mechanism of inhibition of SCR-DNA interaction, thereby, regulating the functional specificity of SCR.
The highly conserved HD is a compact self-folding protein domain which interacts with the major and minor groove of DNA Gehring et al., 1994;Joshi et al., 2007;Religa et al., 2007). All PTMs of the SCR HD are on the solvent exposed surface. These solvent exposed residues do not interact directly with DNA, and if they have a role in regulation of SCR activity, it is unlikely due to alterations in DNA binding. The formylation of K341 and K369 may arise as a secondary modification from oxidative DNA damage when the HD is bound to DNA (Jiang et al., 2007).

Competition of acetylation and formylation observed in SCR
MS/MS analysis of SCRTT identified 4 lysine residues: K218 in the DISPK SLiM, K309 in the linker region, K434 and K439 in the triple tag that are acetylated in some peptides and formylated in others. This formylation/acetylation is found for lysine residues of core histone proteins (Wiśniewski et al., 2008).
In histones, lysine acetylation by histone acetyltransferases (HAT) and lysine deacetylation by histone deacetylases (HDAC), respectively are involved in chromatin remodeling and gene expression (Allfrey et al., 1964;reviewed in Roth et al., 2001). Acetylation has also been reported to modify the activity of TFs, thereby regulating the ability of the TF to bind DNA (Gu & Roeder, 1997; reviewed in Kouzarides, 2000;Bannister & Miska, 2000). Besides histones, HATs and HDACs acetylate and deacetylate non-histone proteins respectively, including TFs which may be a mechanism regulating SCR activity (reviewed in Park et al., 2015;Wang et al., 2011;Glozak et al., 2005;Choudhary et al., 2014).
Formylation of lysine residues is widespread in histones and other nuclear proteins and arise as a secondary modification due to oxidative DNA damage (Jiang et al., 2007). 5'-oxidation of DNA deoxyribose results in the formation of a highly reactive 3-formylphosphate residue which outcompetes the acetylation mechanism and formylates the side-chain amino group of lysine (Jiang et al., 2007). Therefore, amino acid residues of a protein that are acetylated are also found to be formylated in many cases.

SLiM analysis
Out of 7 sites of SCR phosphorylation found, 6 were in SLiMs suggesting that predicted SLiMs are preferential sites of phosphorylation (Sivanantharajah & Percival-Smith, 2015). However, there is not a significant increase in the frequency of phosphorylation of S, T and Y residues in predicted SLiMs relative to the same residues outside the SLiMs. The main reason for 6 of 7 sites of phosphorylation mapping to SLiMs is because the phosphorylatable residues, S, T and Y are concentrated into SLiMs. In addition, a minority of predicted SLiMs were phosphosites. This lack of phosphorylation may reflect that either the ELM database overpredicts SLiMs, or many SLiMs are targets of other PTMs (Puntervoll et al., 2003;Iakoucheva et al., 2004;Khan & Lewis, 2005;Gould et al., 2010;Dinkel et al., 2016) like methylation, hydroxylation, carboxylation, acetylation and formylation. Examples of other PTMs in SLiM regions of SCR include a methylation at S19 and a hydroxylation at P22 of the Drosophila-specific SLiM, SLASCYP and methylation at S166 and K168 of the SLiM, ANISCK (Table S6).

Potential limitations
In order to detect PTMs by MS/MS analysis, a high concentration of the analyte protein is required which requires overexpression as endogenous protein levels are low. This study identified PTMs on SCRTT protein expressed in all embryonic cells after administration of a heat-shock. One of the potential limitations of detecting PTMs on overexpressed proteins is that the heat-shock administered may affect the rate of protein modification or the amount of overexpressed protein is too high for the modification enzymes to modify completely. However, phosphoproteins expressed from heat-shock promoters are still heavily phosphorylated (Krause et al., 1988;Krause & Gehring, 1989;Gavis & Hogness, 1991) ruling out an effect of the heatshock on phosphorylation. An additional limitation is that PTMs are temporally and spatially regulated by the activity of modification enzymes, such as kinases and phosphatases for phosphorylation; therefore, SCR may only be post-translationally modified in a subset of developing embryonic cells resulting in detection of substoichiometric levels of modification. A final limitation is that although the total coverage of SCRTT is 96%, the depth of coverage in some regions of the protein is low and modifications in these under-represented regions may have been missed.
Although phosphorylation of proteins is common, detection of the phosphorylated amino acid residues is still a challenge. Three common explanations are used to address problems with phosphopeptide detection. Firstly, phosphopeptides are hydrophilic, and hence, they are lost during reversed-phase chromatography. Secondly, phosphopeptide ionization is selectively suppressed in the presence of unmodified peptides. Thirdly, the phosphopeptides have lower ionization or detection efficiency when compared to their unmodified moieties. There is no data to support the third argument (Steen et al., 2006). In addition, multiply phosphorylated peptides were detected upon MS/MS analysis of a commercially purchased, pure, heavily phosphorylated, bovine α-casein protein (Larsen et al., 2005) suggesting that the first two problems of phosphopeptide detection were not major issues. Therefore, the substoichiometric nature of SCRTT phosphorylation may not be a technical issue of detection but might arise during expression or purification of SCRTT. Phosphatases may remove phosphates during nuclear fractionation and nuclear lysis; however, this is unlikely during chromatography as it was performed in denaturing conditions. An attempt to enrich phosphopeptides of heavily phosphorylated α-casein using TiO2 beads was successful although the yield of phosphopeptides was lower than with no enrichment. The TiO2-mediated phosphopeptide enrichment proved unsuccessful for SCRTT. The MS/MS analysis of SCR protein is most likely not exhaustive and the PTMs mapped for SCR may not be the complete set that can be detected by MS/MS analysis; however, the analysis of a purified SCR protein has detected more PTMs than the bulk proteomic analyses (Hu et al., 2019;Zhai et al., 2008).

Conclusion
This study identified sites of phosphorylation and other PTMs in a tagged HOX protein, SCRTT, extracted from developing Drosophila melanogaster embryos. These modifications map to functionally important regions of SCR. In testing the hypothesis that HOX SLiMs are preferential sites of phosphorylation, we found that more phosphosites mapped to predicted SLiMs but no support for the hypothesis that the S, T and Y residues of predicted SLiMs are more frequently phosphorylated.