PolyQ-independent toxicity associated with novel translational products from CAG repeat expansions

Expanded CAG nucleotide repeats are the underlying genetic cause of at least 14 incurable diseases, including Huntington’s disease (HD). The toxicity associated with many CAG repeat expansions is thought to be due to the translation of the CAG repeat to create a polyQ protein, which forms toxic oligomers and aggregates. However, recent studies show that HD CAG repeats undergo a non-canonical form of translation called Repeat-associated non-AUG dependent (RAN) translation. RAN translation of the CAG sense and CUG anti-sense RNAs produces six distinct repeat peptides: polyalanine (polyAla, from both CAG and CUG repeats), polyserine (polySer), polyleucine (polyLeu), polycysteine (polyCys), and polyglutamine (polyGln). The toxic potential of individual CAG-derived RAN polypeptides is not well understood. We developed pure C. elegans protein models for each CAG RAN polypeptide using codon-varied expression constructs that preserve RAN protein sequence but eliminate repetitive CAG/CUG RNA. While all RAN polypeptides formed aggregates, only polyLeu was consistently toxic across multiple cell types. In GABAergic neurons, which exhibit significant neurodegeneration in HD patients, codon-varied (Leu)38, but not (Gln)38, caused substantial neurodegeneration and motility defects. Our studies provide the first in vivo evaluation of CAG-derived RAN polypeptides in a multicellular model organism and suggest that polyQ-independent mechanisms, such as RAN-translated polyLeu peptides, may have a significant pathological role in CAG repeat expansion disorders.


Introduction
DNA repeat expansions are the genetic cause of >30 different diseases, most of which affect the nervous system and are currently incurable [1]. Of these, CAG/CTG repeat expansions are associated with at least 14 neurodegenerative diseases. CAG/CTG repeat expansions can occur in either protein-coding exons or untranslated regions. Diseases caused by exonic CAG repeat expansions are often referred to as 'polyQ' diseases, because the expanded CAG repeat is inframe with the encoded protein and leads to the translation of a series of glutamine (Q) a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 residues. Expanded polyQ proteins form large protein aggregates that sequester many other cellular proteins and undergo significant post-translational modifications, such as ubiquitination [2][3][4][5]. While these polyQ aggregates were initially thought to be toxic, other studies suggest that large aggregates may be protective and that smaller oligomers containing polyQ proteins are the toxic entity [6], leading to mitochondrial disruption, alterations in the protein folding landscape, and impaired autophagy [7][8][9]. Despite the dogma that expanded polyQ proteins are the molecular cause of toxicity in exonic CAG repeat expansions, significant data suggests that polyQ oligomers or aggregates may not always be associated with toxicity. For example, in adult Huntington's disease (HD) post-mortem tissues, some white matter regions of the caudate and putamen exhibit significant neurodegeneration but lack detectable polyQ aggregates [10]. Similarly, in juvenile HD (a more aggressive form of HD associated with higher numbers of CAG repeats), the cerebellum undergoes significant neurodegeneration, but lacks polyQ aggregates in post-mortem tissues [10]. In a recent study, patient genetic data shows that uninterrupted CAG repeat length determines the age of disease onset in HD through a mechanism separate from its glutamine-coding property [11,12]. These observations suggest that polyQ-independent toxicity mechanisms may play a role in CAG repeat expansion disorders like HD.
One possible polyQ-independent mechanism involves a recently discovered type of protein translation, repeat-associated non-AUG dependent (RAN) translation [13]. RAN translation uses as a substrate the long, G/C-rich RNA repeats that are typically found in repeat expansion diseases. Since RAN translation does not require an AUG start codon, these G/C-rich RNAs are translated in each reading frame, through a mechanism that is still under investigation [14][15][16]. Additionally, repeat-expansion DNA commonly produces an antisense RNA, which is also G/C rich and thus can also undergo RAN translation [17,18]. RAN translation occurs in both exonic CAG expansion diseases (HD [10] and spinocerebellar ataxia type 8 (SCA8) [19]) and an untranslated CTG expansion disease (the 3'UTR CTG expansion disease, myotonic dystrophy type 1 (DM1) [17]). In both adult and juvenile HD patients, HD RAN products are present in brain regions lacking polyQ that are still undergoing apoptosis [10]. Thus, RAN-translated peptides could be a polyQ-independent mechanism that contributes to the pathology of many or all CAG/CTG repeat expansion diseases.
Identifying which of the CAG RAN polypeptides are pathogenic and their mechanism(s) of toxicity is necessary to understand the toxic potential of CAG/CTG-derived RAN polypeptides. Canonical translation of CAG RNA repeats produces polyglutamine (polyGln). However, the same RNA also undergoes RAN translation to produce polyserine (polySer), and polyalanine (polyAla) [10]. RAN translation of antisense CUG RNA repeats produces polyleucine (poly-Leu), polycysteine (polyCys), and polyalanine (polyAla) [10]. All four RAN polypeptides and polyGln are weakly toxic at 90 repeats and at least two peptides (polySer and polyGln) form protein aggregates when overexpressed in cultured cells [10]. Whether or not more disease-relevant lengths of these peptides are also toxic and form protein aggregates is not known.
To begin addressing these questions, we created codon-varied GFP-tagged CAG-derived RAN homopolymers at both disease-relevant lengths (38 repeats) and highly expanded lengths (90 repeats) using previously described approaches [20]. We expressed these peptides in multiple cellular settings in C. elegans. We found that each RAN peptide formed protein aggregates. However, only polyLeu was toxic in all cellular settings. PolyLeu displayed length-associated toxicity and caused significant neurotoxicity in vivo. Notably, neurotoxicity was not observed with codon-varied polyGln, suggesting that these two polypeptides act through distinct pathways. Identifying the cellular processes involved in polyLeu toxicity may provide new biomarkers and therapeutic targets for HD and other CAG repeat expansion disorders.

Development of a C. elegans CAG-derived RAN model
To evaluate the toxicity of individual CAG-encoded RAN polypeptides, we created C. elegans models for each of the possible CAG RAN products using a previously described codon-variation strategy. This approach maintains the amino acid repeat of the RAN polypeptide but removes the CAG nucleotide repeat [20]. The codon-varied constructs were also designed to minimize computationally predicted hairpin structures in the resulting RNA, as hairpin structures are thought to be required for RAN translation and could lead to the production of nonrelevant RAN products [21] (S1 Fig). The codon-varied RAN polypeptides were expressed through canonical AUG-initiated translation and studied at either 38 repeats or 90 repeats with a C-terminal GFP tag (Fig 1A). The 38-repeat peptides mimic the minimum CAG repeat threshold for developing HD. The 90-repeat peptides mimic the length investigated in a previous study [22]. The modeled RAN polypeptides lacked genetic context (i.e. no flanking human HTT sequence). We chose not to include such genetic context because the unique reading frames of each RAN product means that no two RAN polypeptides share the same flanking sequence. Including the genetic contexts of the RAN polypeptides would limit our ability to assign phenotypes to specific homopolymeric peptides rather than flanking sequences. In addition to the individual RAN polypeptides, we also created an AUG-initiated pure CAG repeat in the polyglutamine reading frame, which mimics previous CAG/polyQ models in C. elegans [23][24][25][26][27]. This pure CAG repeat could exhibit RNA toxicity, RAN polypeptide toxicity, or a combination of both. For clarity, we will refer to the CAG-encoded polyglutamine as "poly-CAG" and the codon-varied polyglutamine as "polyGln". We expressed each construct in both GABAergic neurons and muscle cells using cell-type-specific promoters.

GABAergic neuron model of CAG-derived RAN polypeptides
Neurodegeneration in HD initiates in the striatum and most strongly degrades the GABAergic neuronal population [28]. HD RAN polypeptides are found in degenerating regions of the striatum which lack polyQ, suggesting they contribute to neurodegeneration [22]. We expressed RAN polypeptides in GABAergic motor neurons using the unc-47 promoter [29] and quantified GABAergic neuron commissures, as they are commonly used to measure neurodegeneration because they provide single axon resolution [30,31]. As a measurement of GABAergic neuron function, we evaluated directional reversals since loss of GABAergic neurons significantly decreases the ability of C. elegans to reverse direction [29]. Quantifying the number, morphology, and function of GABAergic neurons is a common measure of the neurotoxic potential of proteins expressed in C. elegans [31][32][33][34].
To test if the functional defects were also associated with cellular damage, we visualized neuronal commissures using a GABAergic-neuron RFP marker [31,32,35,36]. Consistent with our functional data, animals expressing (Gln) 38 -GFP exhibited no defects in commissure structure (Fig 1C and 1D). Animals expressing (Leu) 38 -GFP exhibited highly penetrant commissure defects, although the neuron somas were intact. Where they were visible, the GABAergic neurons in (Leu) 38 -GFP worms traversed the length of the animals instead of the width, suggesting defects in neuron stability and/or axon guidance (Fig 1C and 1D). (Q) 38 -GFP worms also had abnormal commissures, with roughly half of the commissures in each worm being abnormal or incomplete. Although (Cys) 37 -GFP did not cause a significant functional defect (Fig 1B), 19% of the commissures in (Cys) 37 -GFP worms were abnormal (Fig 1C and  1D). (Ser) 38 -GFP did not cause a significant defect in commissure morphology ( Fig 1C). These data show that the expression of (Leu) 38 -GFP, but not other RAN polypeptides, is sufficient to cause both structural and functional defects in GABAergic neurons. They also suggest that factors other than polyglutamine contribute to the toxicity of (Q) 38 -GFP, as a codon-varied (Gln) 38 -GFP did not exhibit functional or morphological toxicity.

Muscle expression of CAG-derived RAN polypeptides
C. elegans GABAergic neurons are anatomically small (cell bodies 1-2 micron) and are inaccessible to some genetic approaches, such as RNAi-mediated gene knockdown [37]. To gain more insights into RAN polypeptide cell biology, as well as to generate a platform for future RNAi and forward mutagenesis-based genetic suppressor screening, we expressed the CAGderived RAN polypeptides in muscle cells using the myo-3 promoter. In many models of neurotoxic proteins, expression in muscle cells causes larval arrest and/or motility defects [32,38]. We found that muscle expression of (Leu) 38 -GFP caused a highly penetrant larval arrest phenotype, whereas expression of (Q) 38 -GFP, (Gln) 38 -GFP, (Cys) 37 -GFP, and (Ala) 38 -GFP caused a weakly penetrant larval arrest phenotype (Fig 2A). (Ser) 38 -GFP did not induce larval arrest. We also examined the effect of each RAN polypeptide on age-dependent post-developmental muscle function, using a previously described conditional expression approach [20]. (Q) 38 -GFP caused a significant increase in paralysis of animals as they aged. However, (Gln) 38 -GFP, (Leu) 38 -GFP, and (Ala) 38 -GFP caused no significant enhancement in paralysis ( Fig 2B). To more quantitatively measure changes in motility, we performed thrashing assays. While only (Q) 38 -GFP caused age-dependent paralysis, (Q) 38 -GFP, (Cys) 37 -GFP, (Leu) 38 -GFP, and (Ala) 38 -GFP all caused a decrease in thrashing rates ( Fig 2C). (Ala) 38 -GFP motility defects were highly penetrant, whereas (Q) 38 -GFP, (Cys) 37 -GFP, and (Leu) 38 -GFP defects were more variable. Similar to motor neuron expression, muscle expression of (Q) 38 -GFP, but not of codon-varied (Gln) 38 -GFP, caused a thrashing defect ( Fig 2C). To test if repeat length affected toxicity, we increased the number of repeats from 38 to 90. Although we were unable to synthesize 90 repeats of polyCAG or codon-varied polyAla for unknown technical reasons, (Gln) 90 -GFP and (Leu) 90 -GFP caused significant thrashing defects ( Fig 2D) while (Ser) 90 -GFP and (Cys) 90 -GFP did not. None of the HD polypeptides caused a shortening of lifespan when expressed at 38 repeats ( Fig 2E). Together, these data in muscle suggest that factors other than polyglutamine contribute to the toxicity of polyCAG.

Localization patterns of CAG-derived RAN polypeptides
To gain insights into the cell biological properties of the CAG-derived RAN polypeptides, we took advantage of the transparent nature of C. elegans and performed live animal fluorescent imaging. Previous studies found that polyQ and polySer form puncta at disease-relevant repeat lengths [22,38]. However, the localization properties of the other codon-varied RAN polypeptides have not been reported. Both (CAG) 38 -GFP and (Gln) 38 -GFP had both diffuse signal and puncta, consistent with previous polyQ models ( Fig 3A). (Ser) 38 -GFP formed puncta with no detectable diffuse signal (Fig 3A), consistent with previous polySer reports [22]. (Ala) 38 -GFP and (Cys) 37 -GFP also formed puncta. (Cys) 90 -RFP and (Ser) 90 -GFP exhibited strong co-localization when co-expressed, although they did not co-localize with a previous polyQ protein model [24], suggesting that these peptides exist in a structural state that does not allow interactions with polyQ ( Fig 3B). Unlike the other RAN polypeptides, the GFP signal for (Leu) 38 -GFP was only detectable in vulval muscle cells and was not detectable in body wall muscle cells (Fig

PLOS ONE
Toxicity of CAG-derived RAN peptides 3A). The low polyLeu signal could be due to localization within a cellular compartment that impairs GFP fluorescence. GFP has impaired folding in oxidizing environments such as the lumen of the endoplasmic reticulum (ER) [39]. Additionally, leucine repeats commonly insert into membranes [40,41]. Therefore, one possibility is that (Leu) 38 -GFP is in a membrane and the GFP tag is oriented within an oxidizing environment. To test this possibility, we replaced GFP with superfolder GFP (sfGFP), which has several point mutations that enhance folding and fluorescence in non-optimal environments, such as the ER [42,43]. Unlike (Leu) 38 -GFP, (Leu) 38 -sfGFP was observed in both adult muscle cells and vulval muscle cells and localized to the periphery of large spherical bodies of unknown origin (Fig 3C). The ability to visualize (Leu) 38 -sfGFP but not (Leu) 38 -GFP is consistent with the possibility that (Leu) 38 may be membrane-bound and localized to an environment that impairs folding of GFP.

All CAG-derived RAN polypeptides form aggregates
PolyGln and polySer are known to form bona fide protein aggregates [22,44]. However, the structure of the puncta containing the other CAG-derived RAN products is unknown. We used Fluorescence Recovery After Photobleaching (FRAP), to better characterize the CAGderived RAN puncta [45]. We found that polyGln, polySer, polyCys, and polyAla puncta each exhibited limited FRAP recovery similar to previously characterized aggregated polyQ (Fig 4). PolyLeu FRAP recovery was more rapid and extensive, suggesting that polyLeu structures have more freely diffusible molecules within the puncta and/or more exchange of molecules with the surrounding cytosol. The total recovery of (Leu) 38 was~20%, compared to <10% recovery for the other RAN polypeptides, suggesting that polyLeu may be toxic by affecting different cellular pathways than polyglutamine or other CAG-derived RAN polypeptides.

PolyLeu toxicity is length dependent
Length-dependent toxicity is a classic characteristic of CAG repeats. PolyGln is well established to exhibit length-dependent toxicity, which we confirmed using our codon-varied constructs (Fig 2C and 2D). To determine if polyLeu also exhibits length-dependent toxicity, we expressed polyLeu at 11, 20, 29, and 38 repeats in GABAergic neurons and muscle cells. Poly-Leu repeats of 29 or below caused no functional defects in GABAergic neurons, as measured by directional reversals (Fig 5A). However, (Leu) 38 -GFP caused a significant decrease in reversal ability ( Fig 5A). Therefore, polyLeu exhibits length-dependent toxicity in GABAergic neurons and requires >29 repeats to cause toxicity. The change in toxicity could be due to altered localization patterns of polyLeu depending on the length of leucine repeats. To test this hypothesis, we expressed the various lengths of polyLeu in muscle cells and visualized their localization. (Leu) 11 -GFP localized to reticular structures that filled the muscle cell, as well as small spherical structures~1 μm in diameter (Fig 5D). (Leu) 20 -GFP localized to the periphery of the muscle cell, potentially the plasma membrane, as well as various structures inside the cell that failed to exhibit a consistent morphology (Fig 5D). Cellular localization of (Leu) 29 -GFP and (Leu) 38 -GFP was challenging to determine since both lengths of polyLeu appeared to inhibit the fluorescence of the GFP attached to the repeats (Fig 5B and 5D). The length-dependent decrease in fluorescence was specific to the polyLeu-bound GFP, as free RFP expressed in the same tissues did not exhibit a length-dependent decrease in fluorescence (Fig 5C). (Leu) 11 -GFP, (Leu) 20 -GFP, and (Leu) 29 -GFP did not exhibit the larval arrest observed in both (Leu) 38 -GFP and (Leu) 38 -sfGFP. Together, these data suggest that the strong toxicity of (Leu) 38 is length dependent, since more than 29 leucine repeats are required to cause larval arrest when expressed in muscle, or reversal defects when expressed in motor neurons. The length-dependent and tissue-independent toxicity of polyLeu suggests that polyLeu could contribute to CAG repeat toxicity, either alone or in combination with polyGln.

Discussion
In this study, we investigated the cell biological and pathophysiological properties of the newly discovered CAG-derived RAN polypeptides. We found that all of the CAG-derived RAN polypeptides formed biophysically defined protein aggregates. However, only a single CAGderived RAN polypeptide, polyLeu, caused both neuropathological changes and functional defects. Given that these studies represent the first in vivo dissection of CAG-derived RAN polypeptide properties, we discuss how our findings integrate with previous reports of the structure and function of each homopolymeric peptide.

CAG repeats exhibit polyglutamine-independent toxicity
The pure CAG-encoded polyglutamine (polyCAG) and the codon-varied polyglutamine (polyGln) exhibit different levels of toxicity in our model. GABAergic expression of (Q) 38 caused significant structural and functional neuronal defects, but GABAergic expression of codon-varied (Gln) 38 caused no detectable defects. Similarly, muscle expression of (Q) 38 caused extensive motility defects, while muscle expression of codon-varied (Gln) 38 caused no motility defects. PolyGln did display length-dependent toxicity, as muscle expression of (Gln) 90 produced motility defects. However, shorter, disease-relevant lengths of polyGln did not cause toxicity, while shorter, disease-relevant lengths of polyCAG did cause toxicity.

Toxicity of CAG-derived RAN peptides
The disparity in the toxicity of these two 'polyglutamine' models is antithetical to the widely-held belief that polyglutamine is the driving cause of the toxicity in HD. Our data suggest that factors other than the translation of CAG repeats into a polyglutamine protein can underlie CAG repeat toxicity. While at odds with dogma, these findings are consistent with recent human genetic studies suggesting that CAG repeat length, but not polyglutamine repeat length, correlates with the age of disease onset in HD [11,12]. Such CAG-dependent, polyglutamine-independent factors could include RNA repeat toxicity [46] or RAN polypeptide toxicity [22]. CAG repeat models in many organisms, including C. elegans, are synonymously referred to as 'polyQ' or 'polyglutamine' models. However, at least some of these models undergo RAN translation and produce RAN polypeptides in addition to polyglutamine [10]. In the future, it will be important to re-examine whether or not existing C. elegans CAG models also undergo RAN translation and produce RAN polypeptides in addition to polyGln. The presence of such polypeptides may dramatically alter the use and interpretation of studies based on these models. Given the widespread prevalence of RAN translation in several CAG repeat expansion disorders, referring to these models solely as 'polyQ models' may no longer be accurate and our descriptions should be modified to include our evolving understanding of the role of RAN polypeptides in addition to polyQ.

PolyCys and PolySer: Weakly toxic RAN products
Both polyCys and polySer exhibited similar cell biological and phenotypic effects when expressed in C. elegans. PolyCys and polySer formed aggregates at 38 and 90 repeats. They also co-localized when expressed together. The co-localization of polySer and polyCys is not solely due to their polar charge. Glutamine is also a polar amino acid, but polyGln did not colocalize with either polySer or polyCys (Fig 3B). A possible explanation of the distinct aggregation patterns of polyGln, polySer, and polyCys could be that polySer and polyCys may localize to a different subcellular environment than polyGln. For example, both Cys and Ser [47], but not Gln, are known targets for post-translational protein palmitoylation, which anchors proteins to cellular membranes [48]. The palmitoylation of Cys residues in another protein, cysteine string protein α, is essential for protein aggregation and is associated with the neurodegenerative disease adult-onset neuronal ceroid lipofuscinosis [49]. Therefore, post-translational modification of polySer and polyCys, but not polyGln, may lead to protein interactions that drive protein aggregation in ways that are distinct from polyGln. Future studies examining the posttranslational modification state of the polySer and polyCys protein could provide insights into these and other potential aggregation mechanisms.
Another common property of polyCys and polySer is that expression of either polypeptide caused weak and inconsistently toxic effects. GABAergic expression of (Cys) 37 caused minor structural but not functional defects, whereas GABAergic expression of (Ser) 38 caused functional defects, but not structural defects. Muscle expression of (Cys) 37 caused weak motility defects and larval arrest. However, muscle expression of polySer never caused motility defects or larval arrest. When expressed on their own, polyCys and polySer exhibit little to no phenotypic consequences in C. elegans. This was surprising considering a recent study which suggested that polySer has significant toxicity in a SCA8 murine model, another CAG repeat expansion disease [50]. However, this model did not express a pure polySer protein. Rather, it expressed the CAG expanded ATXN8 gene which is the genetic cause of SCA8. PolySer is one of several RAN products translated from the SCA8 CAG repeat. However, the other possible RAN products were not measured. In the murine model of SCA8, polySer was thought to be toxic because RAN translated polySer colocalized with degenerating brain regions which lacked the ATXN8 protein. However, RAN translation products from different reading frames typically occur in the same tissue [22], and can occur in the same cells [51]. Therefore, polySer may be acting as a biomarker for RAN translation, and one or more of the other CAG RAN products are toxic in neurons. Our studies suggest that polySer alone does not cause neurodegeneration. However, polySer may have synergistic interactions with other RAN polypeptides that enhance toxicity. In the future, it will be important to test this possibility by co-expressing polySer with each of the other RAN products to determine if there are synergistic effects on toxicity. In addition, it will also be important to place polySer, as well as other RAN polypeptides, in a more endogenous genetic context by including downstream sequence appropriate to each RAN polypeptide reading frame.

PolyAla: A possible contributor to CAG toxicity
PolyAla may be a strong contributor to CAG repeat toxicity. In C. elegans, polyAla was toxic in both muscle and GABAergic neurons. Muscle expression of polyAla caused strong motility defects and GABAergic expression of polyAla caused functional, but not structural, defects. (Ala) 38 -GFP also formed aggregates when expressed in muscle cells of C. elegans. In humans, expanded polyAla repeats cause aggregation in multiple diseases [52]. The aggregation of the polyAla repeats caused the mis-localization of the polyAla containing proteins and a concordant loss-of-function phenotype [52]. Since polyAla toxicity is mediated by the loss-of-function of the protein containing the expanded Ala tract, polyAla toxicity has mainly been studied within its native genetic context. The toxicity of a polyAla peptide without genetic context, as was modeled in our study, has not been thoroughly studied. In addition, because the expanded alanine repeats in the polyAla diseases are on average shorter than 30 repeats, a pure polyAla peptide had not been modeled at an HD relevant length until our work.
A homopolymeric alanine peptide may be toxic through the similar mechanisms as the alanine repeat-expansion diseases. A yeast-two hybrid model found that homopolymeric diseaserelevant lengths of polyAla ((Ala) 29 and (Ala) 28 ) exhibit self-interactions [53]. It is unknown if alanine repeat lengths, which normally appear in proteins (5-21 repeats) [54], also interact with disease-relevant lengths of alanine repeats. PolyAla repeats are enriched in transcription factors [54,55], which need to be localized to the nucleus to perform their function. Therefore, interaction of the transcription factors containing polyAla repeats with the polyAla aggregates could deplete the cell of available transcription factors. PolyAla aggregates sequestering transcription factors could also explain the difference in toxicity observed between muscle cells and GABAergic neurons, as the depleted transcription factors may be less important in GABAergic neurons than in muscle cells.

PolyLeu is the most toxic CAG-derived RAN product
In our C. elegans CAG RAN polypeptide models, the most toxic CAG-derived RAN product across tissues was polyLeu. This was despite the fact that it was expressed at lower levels than the other RAN peptides (S2 Fig). Like polyAla, polySer, and polyCys, polyLeu caused functional defects when expressed in GABAergic neurons and motility defects when expressed in muscle. However, polyLeu expression caused highly penetrant phenotypes in both muscle and neurons which were only weakly observed in the other RAN models. GABAergic expression of polyLeu induced significant morphological defects where every commissure exhibited structural disorganization. Muscle expression of (Leu) 38 -GFP caused larval arrest of~90% of the population.
The strong toxicity of polyLeu was unexpected, as polyLeu repeat expansions have not been previously linked to a genetic disease. Interestingly, polyLeu repeats are one of the most common single amino acid repeats in the human proteome, with~1,500 proteins containing poly-Leu repeats at least four leucines [54]. However, none of the naturally occurring polyLeu repeats exceed 11 leucine repeats, suggesting there is an evolutionary selection against longer polyLeu repeats [54]. The lack of polyLeu repeat expansion diseases could be due to polyLeu repeat expansions disrupting development in humans, as it does in C. elegans. In mammalian cells, RAN translation rates, which produce polyLeu, increase with activation of the integrated stress response pathway [14,16,56], and the integrated stress response pathway is activated with age [57], suggesting that RAN translation rates could increase with age. If RAN translation is correlated with age, RAN-translated polyLeu would not affect development. Another possible reason for the lack of diseases caused by polyLeu repeat expansions is that polyLeu is produced from six different codons, while glutamine is encoded by only two codons. Therefore, codon heterogeneity may effectively protect against repetitive polyLeu-encoding nucleotide sequences. Because polyLeu repeat sequences are not commonly associated with disease, they have received little attention. Limited previous work found that polyLeu repeats longer than 30 repeats caused toxicity in cells [58,59] and in Drosophila [60], but these models were created with a CTG repeat that could be a substrate for RAN translation and subsequent production of other peptides. Our work is the first evidence that a codon-varied pure polyLeu peptide causes significant cellular toxicity.
The toxicity of polyLeu could be due to the apparent membrane-localization of polyLeu. While the expression pattern of polyLeu changes based on repeat length, all lengths appear to be membrane localized. This is consistent with previous findings that polyLeu peptides as short as nine repeats can spontaneously incorporate into lipid membranes [40,61,62]. Leucine occurs in many signal peptides and transmembrane domains [55]. For this reason, leucine is the most common amino acid in proteins that localize to the Endoplasmic Reticulum (ER), Golgi apparatus, or vacuoles [63]. Consistent with this, (Leu) 11 -GFP localizes to a tubular network in C. elegans that resembles previous reports of ER [64] and spherical bodies~1.2 μm in diameter which cluster at the distal regions of the muscle cells. Eleven leucine repeats can act as an anchor signal sequence to target proteins to a membrane [62], so the spherical structures may be part of the endocytic pathway. (Leu) 20 -GFP appears to localize to the plasma membrane and smaller membrane-bound compartments based on its localization to structures similar to the peripheral shape of a C. elegans muscle cell, as well as other spherical and nonspherical structures. Although we were unable to detect (Leu) 38 -GFP, (Leu) 38 -sfGFP localized to the boundaries of unidentified spherical structures. These structures are distributed throughout the length of the muscle cell and are similar in size to those seen with (Leu) 11 . (Leu) 38 -sfGFP also has a diffuse background signal which may represent ER localization, since this signal was visible with sfGFP-tagged (Leu) 38 but not GFP-tagged (Leu) 38 . Whether or not (Leu) 38 disrupts ER homeostasis and how this may contribute to polyLeu toxicity will be the focus of future work.
In conclusion, we have characterized the in vivo properties of novel CAG-associated RAN homopolymers in C. elegans. Our findings suggest that CAG-derived polypeptides other than polyglutamine can cause toxicity through mechanisms that are likely independent of polyQ. While much remains to be learned about these new polypeptides, our findings and those of others strongly suggests that CAG repeat disease therapies targeting only polyglutamine toxicity mechanisms may not be effective. Instead, approaches that mitigate the toxicity of other CAG RAN polypeptides, particularly polyleucine, may need to be considered.

Caenorhabditis elegans strains and culture
Strains were cultured on standard NGM media with gfp(RNAi) bacteria at 20˚C until the generation before the experiment. For motility assays, animals were picked at the L4 stage, shifted to E. coli OP50, and allowed to have progeny at 20˚C. The progeny were picked as L4 animals, kept on E. coli OP50, and placed at 25˚C. All experiments were performed at 25˚C.

Molecular biology and transgenics
Codon-varied homopolymer sequences were synthesized for the 90 repeat polypeptides (Integrated DNA Technologies, Coralville, Iowa, USA). 38 repeat polypeptides were made using a "building block" approach where codon-varied sequences for 11 repeats were synthesized (GeneWiz, South Plainfield, NJ, USA). Each "building block" could be extended by digesting the vector containing a building block with BsmbI, and digesting the insert containing a building block with BsaI as previously described [65]. The nucleotide sequences used for the HD polypeptides are listed in S1 Table. In general, codon varied constructs exhibited a reduction in the ΔG for folding compared to polyCAG (S1 Fig). The only exception to this was polyAla, where predicted secondary structure was increased. Therefore, we cannot rule out the possibility that RNA secondary structure and/or RAN translation contributes to the toxicity for the polyAla peptide. Promoters were produced and cloned in as previously described [20].
Transgenic worms were generated by injecting the RAN polypeptide construct (20 ng/μl) and the myo-3p::mCherry pCFJ104 marker plasmid (100 ng/μl) into the gonad of wild-type animals. Transgenes were integrated using a standard gamma ray (Cs 137 ) mutagenesis, followed by selection of animals exhibiting 100% transmission of the mCherry marker. Integrated strains were outcrossed six times to wild-type animals. Injected animals were maintained on gfp(RNAi) plates until the experimental assay was performed. Quantification of RAN peptidegfp mRNA expression and protein levels showed that while there was variation in the expression levels of the different RAN-GFP transgenes (S2 Fig), the most toxic RAN peptide, poly-Leu, exhibited mRNA and protein expression levels significantly lower than all of the other peptide lines, likely due to genetic selection against the toxicity of this transgene. Whatever the mechanism, this shows that the toxicity associated with polyLeu is not because it is overexpressed relative to the other RAN peptides. All procedures involving recombinant or synthetic nucleic acid molecules and materials were approved by the University of Pittsburgh Institutional Biosafety Committee.

Reversal assays
30-40 progeny were picked as L4 animals and moved to 25˚C on E. coli OP50 bacteria. 24 hours later, the progeny were tested for their ability to reverse (N = 20 animals/genotype). All assays were performed with the experimenter blinded to genotype. Animals were lightly tapped on the head with a platinum pick and scored from 1-4 on their ability to reverse as previously described [66]. Each worm was scored 5 consecutive times, as wild-type worms began to acclimate after 7 taps.

Commissure assays
Strains were moved from 20˚C to 25˚C at least a generation before the experiment and maintained on E. coli OP50. L4 animals of the indicated genotype were isolated at 25˚C and imaged 24 hours later as 'Day 1 adults'. All of the strains contained an unc-47p::RFP marker to reveal GABAergic motor neuron morphology. Animals were anesthetized in 10 mM levamisole and Z-series images of GABAergic commissures were collected. Commissure breaks were identified as interruptions in the RFP signal surrounded by dorsal and ventral RFP in the commissures as previously described [20]. Blebbing was scored only in the commissures and was identified by the presence of one or more RFP varicosities as previously described [20].
Abnormal commissures were identified as those exhibiting branching and/or failing to reach the dorsal side of the animal.

Microscopy
Day 1 adult C. elegans were anesthetized in 10 mM levamisole for 10 minutes. The animals were then mounted on 3% agarose pads for fluorescence microscopy. Wide field images were collected on a Leica DMi4000 inverted microscope and a Leica DFC 340x digital camera (Leica Microsystems, Wetzlar, Germany). Z-stack images were deconvolved using Leica AF6000 software.
For FRAP studies, confocal fluorescence images were captured on a Nikon A1plus confocal microscope through an Apo 60x/ 1.4NA Oil objective lens. The microscope was operated on the NIS-Elements AR version 5.02 software platform. GFP or RFP were excited at 488 or 561, respectively. Images were captured every 11 seconds to avoid bleaching over the course of imaging. Following imaging of baseline fluorescence, a region of interest corresponding to a portion of the puncta was photobleached and fluorescence recovery within the photobleached area was monitored over at least 60 seconds. Data were normalized so that the image preceding the photobleach was set to 100% and the first image following the photobleach was set to zero percent. Imaging conditions over the time course of the experiment caused minimal loss of signal, suggesting an absence of photobleaching during the monitoring period. FRAP analysis was performed using Fiji software [67]. X-Y drift was corrected using the "Correct 3d Drift" plugin [68].

COPAS experiments
Gravid animals grown on gfp(RNAi) were placed on E. coli OP50 for 6 hours at 20˚C to collect a synchronized brood. The adults were then removed and the plates were placed at 25˚C. 48 hours later, the progeny were sorted through a COPAS BioSorter (Union Biometrica, Holliston, MA, USA). Worm time of flight (TOF) was measured. Animals with a TOF � 300 were counted as adults. Animals with a TOF < 300 were counted as larvae. When measuring GFP and RFP, fluorescent detection settings were identical for all samples. Average fluorescence was measured as total fluorescence divided by the time of flight.

Paralysis assays
Gravid animals were moved from gfp(RNAi) to E. coli OP50 and allowed to lay eggs for 24 hours at 20˚C. The resulting progeny were allowed to grow up on E. coli OP50, permitting RAN polypeptide accumulation. Ten L4 animals were placed on each of the five-3cm plates spotted with E. coli OP50 (N = 50 animals/genotype) and moved to 25˚C. Each day, animals that failed to move at least half a body length in response to manual stimulation with a platinum wire but were still alive (pharyngeal pumping, movement of less than half a body length) were scored as paralyzed. Animals that died, desiccated on the plate edges, or exhibited internal hatching of progeny were censored from the assay. Each day, mobile animals were transferred to a new plate and paralyzed, dead, or censored animals were removed from the assay.

Thrashing assays
Gravid animals were moved from gfp(RNAi) to E. coli OP50 and allowed to lay eggs for 24 hours at 20˚C. The resulting progeny were allowed to grow up on E. coli OP50, permitting polypeptide accumulation. The day before the experiment, 40 transgenic L4 animals were transferred to an E. coli OP50 plate and placed at 25˚C. The following day, worms were placed on clean NGM plates and allowed to move freely for 10 minutes to remove excess bacteria.
Worms were then placed individually into 3cm petri dishes containing M9 buffer and allowed to adjust to the new environment for 5 minutes. The worms were observed for 30 seconds and the number of thrashes (reversal of body bend that crosses the midline) was counted.

Lifespan assays
Gravid animals were moved from gfp(RNAi) to E. coli OP50 and allowed to lay eggs for 24 hours at 20˚C. The resulting progeny were allowed to grow up on E. coli OP50, permitting polypeptide accumulation. 10 L4 animals were transferred to five-3cm plates (N = 50 animals/ genotype). Lifespan assays were performed at 25˚C with E. coli OP50 spotted on NGM plates. Worms were classified as alive, dead (no movement in response to touch with a wire), or censored (lost or bagged worms) once per day for lifespan assays.

Statistical analysis
Comparison of means were analyzed with ANOVA using Dunn's post hoc test analysis in GraphPad Prism 7 (GraphPad Software, Inc., La Jolla, CA, USA). For the FRAP analysis, the plateau was measured using one phase association in GraphPad Prism 7. Paralysis assay and lifespan assays were analyzed using the Kaplan-Meier log-rank function (OASIS) [69]. P-values of <0.05 were considered to be significant.