Characterisation of protein isoforms encoded by the Drosophila Glycogen Synthase Kinase 3 gene shaggy

The Drosophila shaggy gene (sgg, GSK-3) encodes multiple protein isoforms with serine/threonine kinase activity and is a key player in diverse developmental signalling pathways. Currently it is unclear whether different Sgg proteoforms are similarly involved in signalling or if different proteoforms have distinct functions. We used CRISPR/Cas9 genome engineering to tag eight different Sgg proteoform classes and determined their localization during embryonic development. We performed proteomic analysis of the two major proteoform classes and generated mutant lines for both of these for transcriptomic and phenotypic analysis. We uncovered distinct tissue-specific localization patterns for all of the tagged proteoforms we examined, most of which have not previously been characterised directly at the protein level, including one proteoform initiating with a non-standard codon. Collectively, this suggests complex developmentally regulated splicing of the sgg primary transcript. Further, affinity purification followed by mass spectrometric analyses indicate a different repertoire of interacting proteins for the two major proteoforms we examined, one with ubiquitous expression (Sgg-PB) and one with nervous system specific expression (Sgg-PA). Specific mutation of these proteoforms shows that Sgg-PB performs the well characterised maternal and zygotic segmentations functions of the sgg locus, while Sgg-PA mutants show adult lifespan and locomotor defects consistent with its nervous system localisation. Our findings provide new insights into the role of GSK-3 proteoforms and intriguing links with the GSK-3α and GSK-3β proteins encoded by independent vertebrate genes. Our analysis suggests that different proteoforms generated by alternative splicing are likely to perform distinct functions.


Introduction
Glycogen Synthase Kinase-3 (GSK-3) is a highly conserved protein kinase that has orthologs in all metazoans, with proteins from distant species such as flies and humans displaying more than 90% sequence similarity in the protein kinase domain [1,2]. Initially identified as an enzyme involved in the regulation of glycogen metabolism, a key role for the Drosophila orthologue encoded by the shaggy (sgg) locus in embryonic segmentation [3] established GSK-3 at the heart of the Wnt signalling pathway in flies and vertebrates [4]. In brief, GSK-3 kinase activity acts to negatively regulate Wnt signalling by phosphorylating β-catenin, Armadillo (Arm) in Drosophila, such that it is ubiquitinylated and subsequently degraded by the proteasome. When Wnt signalling is active, GSK-3 is inactivated, Arm is stabilized and translocates to the nucleus where it binds to the Tcf transcription factor to activate Wnt responsive genes [5]. Considerable work from many laboratories has established that GSK-3 and Wnt signalling is pivotal for cell differentiation and morphogenesis across the Metazoa [5,6].
In vertebrates, there are two major isoforms of GSK-3, alpha and beta, each encoded by independent paralogous genes. While these isoforms share considerable sequence similarity in the kinase domain (85% overall identity, 98% within the kinase domains) [7], they show major differences at their termini with GSK-3α containing a large glycinerich N-terminal region that is absent in GSK-3β [2]. Loss of GSK-3β in mice results in embryonic lethality [8,9] with defects at gastrulation and in axis formation [10,11]. In addition, heterozygotes exhibit a range of phenotypes, particularly in aspects of metabolism, homeostasis and nervous system function [12][13][14][15]. In contrast, loss of function GSK-3α mice are viable but show alterations in glucose metabolism [16] and abnormalities in brain structure and behaviour [17,18]. Interestingly, there is evidence that the mammalian isoforms show both partially redundant and antagonistic interactions [12,[19][20][21][22]. In contrast to vertebrates, the Drosophila genome contains a single GSK-3 locus, sgg, that shows considerable complexity, with 17 annotated transcripts encoding 10 different protein isoforms ( Figure  1A and B). Sgg proteoforms differ at their N termini (5 alternatives), at internal exons and at the C terminus (3 alternatives). At the C terminus, Sgg-PO is unique among the proteoforms and was previously identified as Sgg46 [23]. The remaining nine proteoforms containing either a short C terminus (typified by Sgg-PB (Sgg10) or a longer C terminus typified by Sgg-PA (Sgg39). The longer isoform contains a glycine-rich region that is analogous to the N-terminal domain of the vertebrate GSK-3α. The role of this domain is currently not well understood but is predicted to contain an ANCHOR binding region [24] and two short MATH domain interaction motifs, both thought to be important in protein interactions [25] ( Figure 1C).
In Drosophila, sgg is known to have a variety of developmental roles and interacts with a number of signalling pathways including Wnt, Hedgehog, Notch and Insulin [26][27][28], as well as being implicated in a variety of other cellular processes [29]. Null mutations in sgg exhibit a maternal effect lethal phenotype with a strong segment polarity defect in embryos lacking zygotic and maternal Sgg, as well as defects in the central and peripheral nervous systems [30]. In addition, analysis of a wide range of other alleles has reveal phenotypes in diverse tissues, for example in the macrochaetes, mechanosensory bristles found on the adult thorax, where it has been shown to phosphorylate key transcription factors [31]. However, despite the considerable focus on the role of sgg in development, little is known about how particular proteoforms contribute to specific functions. Previous work indicates that Sgg-PB (Sgg10) is an important proteoform, maternally contributed and detected throughout development into adults [32]. In contrast, Sgg-PA (Sgg39) has more limited expression, it does not appear to be maternally contributed and is not detected in wing imaginal disks, where the adult macrochaetes develop ( Figure 1D). A third major isoform, Sgg-PD (Sgg46) contains a C-terminal domain that includes a caspase-cleavage site appears to be dispensable for viability but has a role in sensory organ precursor development [23,32].
The complexity of the sgg, locus in Drosophila with its multiple proteoforms and the apparent differences between GSK-3 paralogs in vertebrates raises the question of how different GSK-3 proteoforms contribute to the functions of this key kinase during development and in homeostasis. In particular, there has been a general debate as to whether protein isoforms encoded by the multiple splice forms of a particular gene are produced and functional. One extreme, based on evidence from high throughput mass-spectrometry or literature curation of verified proteoforms, contends that the majority of genes encoding multiple alternatively spliced isoforms only produce a single functional proteoform [33,34]. In contrast, an alternative view is that alternatively spliced isoforms generate proteoforms with functionally distinct properties in terms of spatial or temporal expression, or their interaction repertoires [35,36]. To help address the role of alternative Sgg proteoforms in Drosophila and the developmental roles GSK-3 plays, as well as contributing to the debate surrounding the functionality of splice isoforms, we used a CRISPR-Cas9 based genome engineering strategy to tag specific Sgg proteoforms. We introduced fluorescent protein or affinity tags into the endogenous sgg locus in Drosophila [37], altogether tagging eight different exons. These tagged Sgg proteoforms allowed us to follow their expression across embryonic development by immunohistochemistry and/or fluorescence microscopy, revealing unique and specific expression for each of the tagged proteoforms. Focusing on the major C terminal domains, we show that the short form (Sgg-PB) is ubiquitously expressed across embryogenesis and is essential for viability. In contrast, the long form (Sgg-PA) is specifically expressed in the developing nervous system and is not required for viability. Furthermore, using the tagged lines to identify interacting proteins for each proteoform class we found a different set of interactors. This agrees with an analysis of mammalian GSK-3α and β interactions using a yeast 2-hybrid approach which found a different set of interacting proteins for these closely related proteins [69]. We found that the loss of major proteoforms is not always compensated by other isoforms and can lead to age-related pathologies including accelerated senescence. Taken together, our work suggests that the transcript complexity of the Drosophila sgg locus reflects functionally relevant   differences in the spatial and temporal expression of GSK-3 as well as functional differences between major proteoforms.

In vivo tagging of major Sgg proteoforms
In order to determine the expression and localisation of specific Sgg proteoforms we used CRISPR/Cas9 genome engineering to introduce different in-frame protein tags into specific exons at the endogenous sgg locus [37]. We first focused on the major C terminal proteoforms represented by Sgg-PA and Sgg-PB, constructing fly lines containing a 3xFLAG-StrepTagII-mVenus-StrepTagII (FSVS) cassette just before the termination codon. We have previously utilised this cassette in a large-scale protein trap screen [38] and found it was tolerated by a wide range of different Drosophila proteins in vivo. In both cases the lines we generated were homozygous or hemizygous viable and fertile.
Using an antibody against the FLAG epitope we first examined the expression of each tagged proteoform in the Drosophila embryo via immunohistochemistry. With Sgg-PA we found little or no expression during early development but by stage 9 we observed strong and specific expression in the developing CNS of the trunk and brain. As development proceeded expression became prominent in the elaborating PNS and was particularly strong in the chordotonal organs, where it continued to the end of embryogenesis ( Figure 2D-F).
Looking more closely we observed that in the neuroectoderm ( Figure  2G) and PNS ( Figure 2H) of fixed preparations, expression was predominantly cytoplasmic and particularly concentrated in the vicinity of the cell membrane. In the chordotonal organs we found staining associated with the cell bodies and extending into the ciliated endings. Towards the end of embryogenesis we observed specific staining in a subset of cells in the developing brain and in the anterior commissural bundle ( Figure 2I). In contrast, we found strong and ubiquitous staining in the Sgg-PB lines in the early embryo, as expected from the strong maternal contribution detected by RNA-seq analysis ( Figure 1D), that continued until germ band retraction. At later stages expression was prominent in the hindgut, midgut and salivary glands ( Figure 2J-L). At higher magnification in stage 11-13 embryos, we again noticed a concentration of signal associated with the cell membrane in most cells of the developing epidermis although in some cells the entire cytoplasm stained ( Figure 2M). We also noticed an elevated signal in the developing mesoderm ( Figure 2N and O).
To further explore the relationship between the major Sgg-PA and Sgg-PB families of proteoforms we examined the FSVS tagged versions by confocal microscopy, observing very similar, if not identical, localisation to that obtained by immunohistochemistry ( Figure 3A-D).
In particular, employing fluorescence allowed much clearer visualisation of Sgg-PA throughout the chordotonal organs ( Figure 3B) and Sgg-PB in the mesoderm and musculature ( Figure 3D). We also generated alternatively tagged versions of each proteoform by replacing the YFP tag with mCherry. We generated embryos heterozygous for either Sgg-PA YFP /Sgg-PB mCh or Sgg-PA mCh /Sgg-PB YFP and imaged these by confocal microscopy. While in general the mCherry signal was significantly weaker that the YFP, we were able to confirm and extend our immunohistochemistry observations. The fluorescent reporters confirmed the strong localisation of Sgg-PA to  the nervous system and the ubiquitous expression of Sgg-PB, although we did notice slightly elevated Sgg-PB expression in the CNS of late embryos and a more punctate appearance ( Figure 3E-I).

Tagging other sgg isoforms
We extended our analysis to examine other Sgg proteoforms, introducing exon specific 3xFLAG-StrepTagII tags into the endogenous sgg locus. We tagged the C-terminus of the first exon of Sgg-PD (which also tags -PP and -PQ); the first coding exon of Sgg-PG (also tags -PR), the unique terminal exon of Sgg-PO, a unique internal exon of Sgg-PP and an internal exon of Sgg-PQ (shared with -PM, -PP and -PR). Finally, Sgg-PM is predicted to initiate with a valine rather than a methionine and we tagged a unique exon in this proteoform to confirm the translation from a non-standard initiation codon (Supplementary Table 1). All of the tagged lines we generated were homozygous or hemizygous viable and fertile, and we again examined expression in fixed samples by immunohistochemistry with a monoclonal antibody recognising the FLAG epitope (Figures 4 and 5). In contrast to the -PA and -PB tagged lines we noticed that the expression levels of the other proteoforms was generally weaker and mostly restricted to particular tissues.
Sgg-PD: this variant shows early expression that is largely restricted to mesoderm ( Figure 4A) and becomes more prominent at stage 9 ( Figure  4B). At later stages the expression of Sgg-PD is generally weak and ubiquitous with elevated levels found in the CNS, posterior spiracles, Malpighian tubules, proventriculus and salivary glands ( Figure 4C, 4S).
Sgg-PG and -PR proteoforms share a unique amino terminus and have either the long (-PR) or short (-PG) C-termini described above, they also differ in a short internal exon but our tagging does not differentiate between these (Supplementary Table 1). We did not detect any strong expression during early stages ( Figure 4D), but by mid-embryogenesis we observed transient mesoderm expression ( Figure 4E) followed by later expression in the hindgut ( Figure 4F), foregut and the anterior region of the pharynx ( Figure 4T).
Sgg-PM: this proteoform is predicted to initiate with an unconventional start codon, a valine rather than a methionine, and shares a C-terminus with Sgg-PA. We found no early expression of this proteoform ( Figure 4H-H). Late in development we observed expression in the pharynx, proventriculus and weakly in the hindgut (Figure4I).
Sgg-PO has a unique C-terminus ( Figure 1), and is not detected during early stages ( Figure 4J) but again shows detectable expression in the mesoderm ( Figure 4K) from stage 9 and at later stages in the hindgut, posterior spiracles and prominently in the proventriculus ( Figure 4L, 4U).
Sgg-PP: we did not find any expression during early embryogenesis, although there appears to be faint staining in the mesoderm at stage 9 ( Figure 4 M -N), but we did observe expression in the hindgut and anterior midgut of late embryos ( Figure 4O, 4V).
Sgg-PQ: we tagged this proteoform at an internal exon shared with-PR, -PP, and -PM. These proteoforms were not detected during early embryogenesis ( Figure 4P and Q). At later stages, Sgg-PQ is prominent in the salivary glands and proventriculus ( Figure 4R).
Taken together, our tagging strategy has revealed dynamic and tissuespecific expression of different Sgg proteoforms during embryogenesis. Our most striking finding is the clear difference in expression of the major C-terminus proteoforms exemplified by Sgg-PA and Sgg-PB, which, as we describe above, are likely to correspond to the GSK-3α and GSK-3β proteins encoded by separate genes in vertebrate genomes. We therefore elected to generate isoform specific loss of function mutations in each of the major isoforms by separately deleting their specific C-terminal exons as described below. A further noticeable feature of the proteoform expression was the localisation of different tagged proteoforms in the developing digestive system, particularly in the proventriculus and hindgut. While it has been shown that sgg expression is enriched in particular regions of the adult gut, particularly the crop and hindgut [39], there have been relatively few reports of functional roles for sgg in the gut [40,41] or other specific tissues in the embryo [42]. Finally, the extensive early mesoderm expression shown by several proteoforms is consistent with a previously established role for sgg and Wnt signalling in embryonic muscle cell progenitors [43].

Isoform specific null alleles
We used CRISPR-Cas9 genome engineering to generate in locus deletions to remove the C-terminal exons specific for the Sgg-PA or Sgg-PB proteoform families. We completely removed the unique exons, replacing them with a 3Px3 driven DsRED marker, flanked by LoxP sites, that was subsequently removed by the activity of Cre recombinase to leave the remainder of the locus largely unaltered [44].
In this way we generated Sgg-PB mutations, where the following upstream exon unique for isoform Sgg-PA was left intact. Similarly, we removed the last unique exon for isoform Sgg-PA ensuring the flanking sequences were not affected. Both of the exon deletions were confirmed by genotyping via PCR and sequencing. Deletion of the Sgg-PA class proteoforms (sgg isoA ) resulted in flies that were viable and fertile whereas loss of the Sgg-PB class proteoforms (sgg isoB ) resulted in late embryo/early larval lethality. We note that in the case of sgg isoA , progeny from homozygous mothers lack both maternal and zygotic contributions and are thus completely null, whereas progeny from hemizygous sgg isoB mothers have some maternal contribution of wild type transcript or protein.
We examined mutant lines by immunostaining to determine any effects on CNS and PNS development and only observed minor defects in a small percentage of progeny (<5%, not shown). Similarly, examination of larval cuticles revealed a similar low frequency of defects, with the very occasional appearance of animals resembling sgg loss of function phenotypes (<1%). We conclude that the unique C-terminal extension defining the Sgg-PA class of proteoforms is dispensable for normal development, a similar situation to that in vertebrates with the loss of GSK-3α. We presume that in this case, ubiquitous expression of the shorter Sgg-PB class of proteoforms is able to provide sufficient Sgg function in the nervous system. In contrast, loss of the Sgg-PB class terminal exon is lethal and either cannot be rescued by the longer Cterminus or selection of the Sgg-PA terminal exon is tissue specific and unable to be spliced in some tissues.
To examine whether the loss or reduction of Sgg proteoforms had consequences for gene expression, we performed RNA-seq analysis using RNA extracted from the null mutants we described above ( Figure  5A, Table S1-S2). In the case of sgg isoA , embryos from homozygous mothers crossed to hemizygous fathers are completely null for the proteoforms containing this exon and we compared RNA from these embryos with stage matched embryos from the progenitor stock. We performed triplicate biological replicates and after filtering (1.6-fold expression change, p<0.05) we identified 100 genes with significantly changed expression (26 up and 74 down) with no significant enrichment of any Gene Ontology terms apart from a down-regulation of 6 mitochondria-encoded respiratory chain components (ATPase-6, Cyt-b and 4 ND subunits) along with 7 other enzymes involved in respiration or redox reactions. Given that sgg isoA null embryos are viable and fertile the lack of any major effects on gene expression was not unexpected.
In the case of sgg isoB we collected null hemizygous male embryos from heterozygous mothers (identified by lack of a GFP balancer) and compared this with RNA from the progenitor stock. While we expect a significant rescue of the zygotic mutation by the maternal component, we nevertheless identified 482 genes with significant expression changes (94 up and 388 down) (Table S1-S2). In particular, we noted a strong upregulation of Heat Shock Factor (Hsf) and a number of Hsf target stress response and chaperone genes, but down regulation of sets of genes implicated in cuticle development and proteolysis. Many of these dysregulated genes form a highly connected network (p <10e-16 Figure 5B) indicating that mutants are clearly perturbed at the transcriptional level. We presume these gene expression changes reflect the gradual loss of sgg isoB maternal product, however, and in line with the lack of overt morphological phenotypes, we note that we found no apparent changes in any major developmental or signalling pathways.

Proteoform-specific interactions
Since the two major Sgg proteoform classes are expressed in spatially different patterns it is possible that they participate in different pathways, have different protein partners or preferentially act on a different spectrum of substrates. To gain insight into possible unique roles we performed immunopurifications followed by mass spectrometry analysis to identify Sgg-PA and Sgg-PB interactomes. Using our previously described iPAC approach [45] we performed independent purifications using the StrepII, FLAG and YFP tags introduced into the sgg locus to increase the reliability of the interacting partners identified. In parallel, we also used a protein trap line we had previously generated (sgg CPTI002603 ) that appears to tag the majority of Sgg proteoforms, along with a w 1118 negative control. We applied the QProt pipeline, a tool for examining differential protein expression, with a p-value cutoff of 0.05 and a requirement that putative interactors be identified in at least 2 out of the 3 independent pull downs after correcting against the wild type control. We identified 20 co-purifying proteins with Sgg CPTI002603 , 26 with the Sgg-PA isoform and 21 with Sgg-PB (Table 1).
We first determined whether the lists of potential interactors were enriched for any Gene Ontology terms and found that the Sgg-PA and Sgg CPTI002603 lists were significantly enriched for processes involved in ribosome assembly, cytoplasmic translation (adjusted p 2e-06) and protein folding, whereas the Sgg-PB list showed no significant enrichment. While these enrichments may suggest that the presence of the YFP tag in the lines affects protein synthesis, slowing it to allow the fluorescent protein to fold, we note that in mammalian systems GSK-3β has been shown to complex with chaperones during maturation [46] and to colocalise with chaperone complexes in a Huntington's disease model [47]. Thus, whether these represent noise or biologically relevant interactions remains to be determined, however, we note that no such enrichments were observed with the Sgg-PB proteoform and reduction in Sgg-PA proteoform levels does not lead to upregulation of the stress response in our RNA-seq analysis, suggesting there is no general translational disruption in the -PA and CPTI lines.
In common to the 3 different tagged lines we detected significant enrichment of Bicaudal and CG10591 proteins (red in Table 1). Along with its characterised maternal role in early segmentation, Bicaudal is widely expressed during embryogenesis where it has a role in translation via binding nascent peptides. CG10591 is a protein of unknown function. Encouragingly, with Sgg-PB and the CPTI lines we identified Armadillo (b-Catenin), a known Sgg substrate, as an interacting partner but not with Sgg-PA. With the CPTI line we also identified Shotgun, an E-Cadherin known to bind b-Catenin, and a-Catenin, which also interacts with Arm. Finally, the G protein α o subunit, known to be involved in Wnt signalling, [48] was identified with Sgg-PB. Thus for two of the lines we find evidence for predicted interactions. Specific to the Sgg-PB line we identified the gap junction protein Innexin 2 [49], known to localise with Sgg and E-Cadherin, along with the cell adhesion molecule Fasciclin 1 and Neyo, a component of the zona pellucida complex [50]. We also detected the interaction with Regulatory particle non-ATPase 7, which is involved in the ATP-dependent degradation of ubiquitinated proteins including CACT, an important component for the degradation of NF-kappa-B inhibitor or degradation of Cl that participates in the Hedgehog (Hh) signalling pathway. Together these interactions are consistent with the enrichment of Sgg-PB isoform at the cell membrane we described above and also with known roles for Sgg in regulating aspects of cell junctions. In the case of Sgg-PA, we identified the Talin protein Rhea, the actin binding profilin Chickadee and Gamma Tubulin 23C, cytoskeletal components known to be expressed in the nervous system. Among other interactors of Sgg-PA associated with nervous system localisation we detected Lark [51], which mediates aspects of circadian clock output [52] and two subunits of the Chaperonin TCP complex (2 Sgg-PA Sgg-PB Sgg-CPTI and 4) that are known to have roles in the nervous system. Together, these identified interacting proteins are consistent with the localisation of Sgg-PA to the CNS. Interestingly, the RNA-seq analysis of sgg isoA mutants identified several misregulated components of the mitochondrial ATP synthesis pathway and our IPAC analysis identified the gamma subunit of ATP synthase along with the mitochondrial regulator TFAM, indicating a potential link with neural energy homeostasis. Taken together, the iPAC analysis identified several known and new Sgg interacting proteins, found little overlap between proteins enriched with the proteoform-specific pulldowns and indicates that Sgg is involved in diverse, tissue-specific processes in the embryo.

Lifespan and locomotor dysfunction in sgg isoA mutants
While GSK-3 is a recognized target for the treatment of age related pathologies and multiple diseases, its role in the aging process remains unclear [53]. According to some studies, RNAi knockdown of sgg in Drosophila shortens lifespan or causes lethality [54]. However, these results are contradictory to expectations from earlier studies suggesting that lithium treatment extends lifespan via GSK-3 inhibition, determined using specific RNAi to mediate reduction in sgg expression [55]. In knock-out mice, loss of GSK-3b is embryonic lethal, whereas GSK-3a null mice exhibit shortened lifespan and increased age-related pathologies [56]. Given the apparent contradictory evidence of the role of sgg in lifespan, we investigated whether loss of the Sgg-PA proteoform, which is viable and fertile, positively or negatively influences life span in Drosophila.
We performed a standard survival analysis of sgg isoA null males and females separately, along with matched flies from the progenitor strain. We found that homozygous sgg isoA females showed significantly reduced survival (~15%, p <0.05) compared to controls and an even more sever reduction (~25%, p<0.05) in hemizygous males ( Figure  6A). These results indicate that in flies as in mice, deletion of the Sgg-PA proteoform has a negative effect on longevity.
Since Sgg-PA is extensively expressed in the developing nervous system and GSK-3a mutant mice have nervous system phenotypes, we sought to determine whether the loss of this proteoform has effects on neural function in flies by looking at impairment in locomotor activity via climbing assays ( Figure 6B). In homozygous females the loss of Sgg-PA resulted in a 40% decrease in locomotor activity across the lifespan with a similar reduction observed in hemizygous males. The maximum climbing activity was observed in 10 days old flies and was just over 75% for the control line and approximately 45% for the sgg isoA null. The climbing activity gradually decreased over time and in 40 days old flies was reduced to 55% for the control line and 27% for sgg isoA null flies ( Figure 6B). These observations indicate that loss of the predominantly nervous system expressed Sgg-PA proteoform impairs motor function.
Taken together, our results indicate that although conditional modulation of GSK-3 levels may prolong lifespan or can mitigate the negative age-associated symptoms observed with diseases such as Alzheimer's or diabetes [53], the isoform specific knockout of a nervous system-specific proteoform results in reduced lifespan and locomotor defects. Given the positive impact of GSK-3 inhibition on multiple diseases ranging from neurological disorders to cancer, as well widespread therapeutic interventions targeting GSK-3, further studies are required to assess long-term effects on the aging process and the risks associated with nervous system impairment.
Taken together, our studies indicate that GSK3 performs complex functions mediated by multiple different spliced isoforms that generate functionally distinct proteins. At the level of proteoform expression we provide evidence of complex temporal and tissue specific protein localisation that presumably results from highly regulated tissuespecific splicing as well as evidence for the use of non-canonical transcriptional initiation. Our most striking finding is the major functional differences observed between the two most abundant 3' coding exons, with the shot form mediating the well-established essential roles for Sgg in development, whereas the long form has nervous specific expression and measurable functional roles in nervous system function. This situation is reminiscent of the divergent roles apparently played by GSK-3α and GSK-3β in mammalian systems. The identification of a different repertoire of interacting proteins for the major Drosophila GSK3 proteoforms may provide clues as to the differing roles played by the vertebrate orthologues and opens a route to understanding how this critical kinase can be deployed in different biological contexts.

Materials and Methods
Cloning gRNAs. To generate the transgenic fly lines carrying the tagged isoforms, we used CRISPR/Cas9 technology as previously described [37]. We initially designed the insertion sites as indicated (Figure 1) and choose appropriate gRNAs ( Table S3) that were cloned into pCDF3 or pCDF4 [57]. Briefly, target specific sequences were synthesized and either 5′-phosphorylated annealed, and ligated into the BbsI sites of pCDF3 or amplified by PCR for cloning into pCDF4 precut with BbsI. Generation of donor vectors. Unless otherwise noted, cloning of donor vectors was performed with the Gibson Assembly Master Mix (New England Biolabs). PCR products were produced with the Q5 High- Fidelity 2X Master Mix (New England Biolabs). All inserts were verified by sequencing. Schematics of plasmid construction are provided in Figure S2. Primers used for plasmid construction are listed in Table S4. Drosophila methods. Embryos were injected using standard procedures into the THattP40 line expressing nos-Cas9 [58,59]. 500 ng/μL of donor DNA in sterile dH2O was injected together with 100 ng/μL of gRNA plasmid. Individually selected surviving adults were crossed to w 1118 and the progeny screened for fluorescence: positively flies were balanced and homozygous stocks established where possible (Table  S5). Injections were performed by the Department of Genetics Fly Facility (https://www.flyfacility.gen.cam.ac.uk). All fly stocks were maintained at 25 ℃ on standard cornmeal medium. Embryos were collected from small cages on yeasted grape juice agar plates.
Immunostaining. Localization of tagged proteoforms in embryos was visualized by immunohistochemistry using Mouse Anti-FLAG M2 (F1804 Sigma), followed by biotinylated goat anti-Mouse IgG (BA-9200, Vector Laboratories) and the Vetastain ABC HRP Kit (PK-4000, Vector Laboratories) using standard protocols [60]. Embryos were mounted in glycerol and imaged using a Zeiss Axiphot. Confocal microscopy: For fluorescence imaging, embryos were collected, dechorionated and quickly fixed to avoid bleaching then mounted in glycerol. For live imaging, embryos were dechorionated and mounted in halocarbon oil. Images were acquired using a Leica SP8 confocal microscope (Leica microsystems) with appropriate spectral windows for mVenus and mCherry. images were processed with the Fiji software [61]. RNA seq: 10-15hr embryos from a homozygous sgg isoA stock and a control line were collected and processed for RNA-seq as described below. In the case of sgg isoB , non-fluorescent embryos from a cross between sgg isoB /FM7-GFP X FM7-GFP/Y were collected. In parallel, non-fluorescent embryos from a +/FM7-GFP X FM7-GFP/Y cross were also collected (FM7-GFP: FM7c, P{GAL4-twi.G}108.4, P{UAS-2xEGFP}AX  20.gtf and options GTF.featureType="exon" and GTF.attrType="gene_id" and default parameters. Read counts per experiment (sgg isoA and sgg isoB experiments were processed independently) were imported into edgeR (3.14.0) and filtered using the filterByExpr function with the default parameters (10397 genes were retained for sgg isoA and 10693 genes for sgg isoB ). The data was then normalised in limma (3.28.21) using limmavoom. Significant genes were identified fitting a linear model (lmFit) and empirical Bayes method (eBayes) [62][63][64][65]. Genes were considered significant differential expressed with fdr <= 0.05 and logFC >= |0.7|. RNA-seq data are available from GEO (GSE139040). iPAC-MS: Embryos from 8-20 hr collections were washed from agar plates with tap water, collected in 100 μm sieves, rinsed in the same solution to remove any yeast, dechorionated in 50% bleach for 1 min, rinsed again and placed on ice. Where necessary, washed embryos were frozen at −80°C until a sufficient quantity was collected. For each purification, ~200 μL wet-volume of embryos were manually homogenized with a 2 ml Dounce homogenizer in 1 ml of extraction buffer (50 mM Tris, pH 7.5, 125 mM NaCl, 1.5 mM MgCl2, 1 mM EDTA, 5% Glycerol, 0.4% Igepal CA-630, 0.5% digitonin and 0.1% Tween 20) and processed essentially as previously described [45]. Samples were independently immunopurified using StrepII, FLAG and YFP. ANTI-FLAG ® M2 affinity gel (Sigma) and Strep-Tactin® Superflow® resin (IBA) were used to capture each FLAG-tagged and StrepII-tagged bait and its binding partners, respectively [45]. For pulldown of fluorescently tagged proteins (YFP), anti-GFP mAb agarose resin (MBL International) was used. Briefly, protein concentration estimation in the embryo lysate was performed using a DC assay (Bio-Rad). The lysate was divided equally into three parts (6 mg total protein per pulldown), to which each resin, pre-washed in extraction buffer, was added. Following 2 h of incubation at 4°C on a rotating wheel, the resin was washed three times in extraction buffer. Immunoprecipitates were eluted twice each, using 100 μg/mL 3xFLAG peptide (Sigma) in lysis buffer for FLAG immunoprecipitates and 10 mM desthiobiotin in lysis buffer for Strep-Tactin immunoprecipitates; each for 10 minutes at 4°C. Anti-GFP resin immunoprecipitates were eluted in 100 mM glycine-HCl, pH 2.5 with gentle agitation for 30 seconds, followed by immediate neutralization in 1 M Tris-HCl, pH 10.4.
Purification of the baits was confirmed via immunoblots (data not shown) and samples were prepared for mass spectrometric analysis using ingel digestion, allowing the sample to enter 2 cm into an SDS-PAGE gel. Gels were fixed and stained with colloidal Coomassie stain, after which the protein-containing band was excised and cut into two equally sized parts. Each band was destained, reduced with dithiothreitol, alkylated with iodoacetamide and subjected to tryptic digest for 16 hours at 37°C. Approximately 1 μg of peptides from each digested band was analysed using LC-MS/MS on a Q Exactive mass spectrometer (ThermoFisher Scientific), as previously described [66].
For label-free quantification (LFQ), data were processed using Within MaxQuant, searches were performed against a reversed decoy dataset and a set of known contaminants. Default search parameters were used with digestion using trypsin allowing two missed cleavages and minimum peptide lengths of six. Carbamidomethyl cysteine was specified as a fixed modification. Oxidation of methionine, N-terminal protein Acetylation and Phosphorylation of serine, threonine and tyrosine were specified as variable modifications. Additionally, "match between runs" was enabled with fractions specified to limit matching to occur only between replicates of the same bait and tag combination.
Identification of interacting partners was performed with QProt [68]. Input for QProt was created from the MaxQuant "proteinGroups.txt" output file. Individually for each bait and tag combination, proteins that were not reverse decoys or potential contaminants were extracted where there was an LFQ reported for at least one tagged replicate. The LFQ values for proteins detected in the tagged bait pulldowns were matched with the corresponding tag in the wild type pulldown. Enrichment analysis was then performed for each bait and tag combination against the corresponding wild type using qprot-param with burn-in set to 10,000 and number of iterations set to 100,000. The QProt tool getfdr was used to calculate the FDR of enrichment. Any proteins enriched in pulldowns using at least two of the three tags with an FDR of less than 0.05 were classed as enriched. The highest enrichment FDR from replicates where the protein is reported significant is reported. Lifespan determination: Adult female and male flies were collected shortly after eclosion and separated into 5 cohorts of 100 flies (500 total) for each genotype. Flies were maintained at 25°C and transferred to fresh food every 2 days at which time the number of surviving flies was recorded. Locomotor behaviour: Adult female and male flies were collected shortly after eclosion and separated into 10 cohorts consisting of 10 flies (100 total) for each genotype. Flies were maintained at 25°C and transferred to fresh food every 3 days. For the climbing assay, each cohort was transferred to an empty glass cylinder (diameter, 2.5 cm; height, 20 cm), and allowed to acclimatize for 5 min. For each trial, flies were tapped down to the bottom of the vial, and the percentage of flies able to cross an 8-cm mark successfully within 10 s was recorded as the climbing index. Five trials were performed for each cohort, with a 1-min recovery period between each trial. Climbing assays were performed 1, 5, 10, 30 and 40 days after eclosion.