The Musashi 1 Controls the Splicing of Photoreceptor-Specific Exons in the Vertebrate Retina

Alternative pre-mRNA splicing expands the coding capacity of eukaryotic genomes, potentially enabling a limited number of genes to govern the development of complex anatomical structures. Alternative splicing is particularly prevalent in the vertebrate nervous system, where it is required for neuronal development and function. Here, we show that photoreceptor cells, a type of sensory neuron, express a characteristic splicing program that affects a broad set of transcripts and is initiated prior to the development of the light sensing outer segments. Surprisingly, photoreceptors lack prototypical neuronal splicing factors and their splicing profile is driven to a significant degree by the Musashi 1 (MSI1) protein. A striking feature of the photoreceptor splicing program are exons that display a "switch-like" pattern of high inclusion levels in photoreceptors and near complete exclusion outside of the retina. Several ubiquitously expressed genes that are involved in the biogenesis and function of primary cilia produce highly photoreceptor specific isoforms through use of such “switch-like” exons. Our results suggest a potential role for alternative splicing in the development of photoreceptors and the conversion of their primary cilia to the light sensing outer segments.


Author Summary
Vertebrates possess extraordinarily complex nervous systems that are formed by hundreds of different types of neurons. This neuronal diversity is achieved in part through the expression of specific sets of genes. In addition, neurons use alternative splicing to a much higher degree than other cells to produce neuronal variants of broadly expressed proteins. Here we characterize the transcriptome of a single neuronal type, the vertebrate Introduction Vertebrate nervous systems contain numerous types of neurons with characteristic morphology, connectivity, electrophysiological properties, and neurotransmitter signatures. Single cell transcriptome profiling studies reveal dozens of distinct gene expression profiles in the central nervous system (CNS) and the retina, suggesting that neuronal cell identity is established and maintained by specific gene expression programs [1][2][3][4][5]. A limitation of the single cell approaches is the relatively low coverage of the transcriptome that is biased towards the 3'-end of the transcripts [6]. The depth and distribution of the reads produced by the current single cell transcriptome profiling approaches do not allow the reliable assessment of the levels of transcript isoforms produced by alternative splicing. Thus, the posttranscriptional layer in the regulation of gene expression in neurons, which is required for the normal development and function of the CNS, remains hidden [7][8][9][10][11][12].
Alternative pre-mRNA splicing is a major mechanism for generating protein diversity in vertebrates. In particular, neurons use alternative splicing for generating protein diversity to a significantly higher degree than any other cell type [13,14]. Characteristically, neurons broadly utilize microexons that are defined by different groups as being no longer than 27nt or 51nt [10,15]. The neuronal splicing program and the inclusion of neuronal microexons are governed by splicing factors belonging to several families of RNA binding proteins: PTBP, ELAVL, NOVA, KHDRBS, SRRM and RBFOX [14,[16][17][18][19]. With the exception of PTBP1, these proteins have high expression levels in neurons and are not expressed or have limited expression outside of the nervous system. PTBP1, which represses splicing of neuronal exons outside of the nervous system, is replaced by the PTBP2 in the early stages of neuronal differentiation.
Retinal photoreceptor cells provide an intriguing model to study how gene expression programs shape the cell structure and properties. Photoreceptors have a distinct morphology with a characteristic light sensing organelle termed the outer segment. The photoreceptor outer segment is a sensory cilium with an elaborate structure of membrane stacks. Surprisingly, the genes involved in the biogenesis and maintenance of the photoreceptor cilium are ubiquitously expressed in all ciliated cells. Recently, isoforms for two of these proteins, Arl6 (BBS3) and Ttc8 (BBS8), were shown to be preferentially expressed in photoreceptors [20][21][22][23]. The retinal variant of Arl6, a Ras related GTP-binding protein, is required for the survival of zebrafish photoreceptor cells and disruption of the gene in mice results in retinal defect [20,21]. These findings raise the possibility that photoreceptor cells are at least in part shaped by posttranscriptional processes such as alternative pre-mRNA splicing.
Here we use animal models to characterize in depth the alternative splicing profiles of photoreceptor cells, a sensory neuron type. We find that photoreceptors express a characteristic splicing program that includes a set of highly photoreceptor-specific isoforms. Surprisingly, key neuronal splicing regulators are either not expressed or downregulated in photoreceptor cells. We show that Musashi 1 (MSI1) promotes the splicing of photoreceptor specific exons as part of a combinatorial mechanism that controls splicing in photoreceptor cells.

Isolation the photoreceptor transcriptome
To identify the features of the retina transcriptome that are specific to photoreceptor cells we analyzed the transcriptomes of retina samples from wild type and Aipl1 knockout mice by RNA-Seq. AIPL1 is a molecular chaperone that is required for photoreceptor survival and by postnatal day 30, the retina of Aipl1(-/-) mice is devoid of photoreceptors ( Fig 1A) [24]. Apart from the missing photoreceptors, the Aipl1(-/-) retina has grossly normal anatomy (Fig 1A) [24]. In a comparison between the transcriptomes of Aipl1(-/-) and wild type retina, transcripts with higher expression levels in photoreceptors will appear downregulated in the Aipl1(-/-) sample due to the altered cell composition (Fig 1B). Conversely, transcripts expressed at higher levels in the inner neurons will show elevated expression levels in the Aipl1 knockout retina.
We validated our approach by performing gene level expression analysis and tracking the levels of transcripts specific to photoreceptors. We identified 5377 genes with more than 2 fold difference in their expression level between Aipl1(-/-) and wild type retina (S1 Table). In the Aipl1 knockout, we observed loss of genes known to encode photoreceptor-specific transcription factors, (e.g. NR2E3, NRL), proteins involved in phototransduction, (e.g. RHO, CNGA1, PDE6B), and photoreceptor morphogenesis (e.g. PRPH2, ROM1, FSCN2) (S1 Table). The genes with higher expression in the wild type retina compared to the Aipl1 knockout, showed enrichment of Gene Ontology (GO) categories directly related to photoreceptor development, structure and function (S2A and S2B Table). This enrichment is consistent with the loss of photoreceptor cells in the Aipl1 knockout. The genes with lower expression levels in the wild type retina were part of broad GO categories related to organ development, neuronal cell structure and function (S2A and S2C Table). Thus, the gene level expression data demonstrates that comparing the retinal transcriptome of Aipl1 knockout with that of the wild type retina correctly identifies the transcripts characteristic to photoreceptors.

Photoreceptors express a characteristic splicing program
We next determined the inclusion levels of alternative exons in the mouse retina (S3 Table). Hierarchical clustering shows that the retinal samples form a separate cluster with a splicing profile in part related to that of other neuronal tissues (S1 Fig). Similar to central nervous system samples, the retina utilizes a significant number of microexons (S1 Fig).
We analyzed the differences in exon inclusion levels between the wild type retina the retina of the Aipl1 knockout. Approximately 40% of the differentially spliced exons between wild type and Aipl1 knockout retina were not annotated in the GRCm38 mouse genome assembly. The large number of novel exons prompted us to use Cufflinks to carry out guided transcriptome assembly based on the ENSEMBL GRCm38 annotation and our RNA-Seq data. We then repeated the analysis of the differential splicing using the updated annotation and identified 540 differentially spliced exons in 372 genes (S3 Table). Of these, 318 exons showed higher inclusion levels in wild type retina and 222 had lower inclusion levels. Alternative exons in the Bsg and Ttc8 (Bbs8) genes that are known to be used exclusively in photoreceptors were among the exons with higher inclusion levels in wild type retina, verifying that our approach correctly identifies photoreceptor-specific exons [22,23,25]. We used RT-PCR to test 18 alternative exons that showed differences in inclusion level between 10% and 90% in our RNA-Seq data. All differences in exon inclusion level that were predicted by RNA-Seq were confirmed by the RT-PCR experiment (Fig 1C and S2 Fig). Exons in multiple genes such as Cep290, Cc2d2a, Cacna2d4, Prom1 and Kif1b showed large differences in inclusion levels between the wild type and Aipl1(-/-) retina consistent with a "switch-like" splicing pattern. Similar to the Bsg and Ttc8 exons, these "switch-like" exons appear to be included at high levels in photoreceptors and skipped in all other tissues we examined (Figs 1B and 2).
The photoreceptor splicing pattern we inferred may be due in part to splicing changes in the neurons of Aipl1(-/-) retina in response to the loss of the photoreceptors. To rule out this scenario we analyzed by RT-PCR the inclusion levels of 18 alternative exons in flow sorted rod photoreceptors. The inclusion levels of the majority of the tested exons in rod photoreceptors (17 out of 18) were in concordance with the inferred photoreceptor splicing pattern (Fig 1 and  S2 Fig). As expected, the levels of the predicted photoreceptor-specific variants were higher in the isolated photoreceptors compared to the whole retina sample, where the signal is derived from a mixed cell population.
Unsupervised hierarchical clustering of the mouse tissue panel based on the inclusion levels of the exons differentially spliced in photoreceptors places the retina along with the other neuronal tissues (Fig 2). In this clustering the profile of the Aipl1(-/-) retina is more closely related to that of the CNS samples than the wild type retina. The expression of a distinct splicing profile by the photoreceptor cells is likely responsible for the separation of the Aipl1 knockouts from the cluster containing the wild type retinal samples.
As cone photoreceptors comprise only 3% of the retina, it was unclear if the splicing profile we discovered is shared between photoreceptors of different types or if it is specific to rod photoreceptors. To determine if rods and cones share the same splicing program we analyzed the splicing in the retina of Nrl knockout mice by RT-PCR. Disruption of Nrl, a rod-specific transcription factor leads to the conversion of all rod photoreceptors into cone like cells that present the characteristic ultrastructural, histological, molecular and electrophysiological features of cone photoreceptors [26,27]. All tested exons, with exception of an exon in the Glb1l2 (Galactosidase, Beta 1-Like 2) gene, showed identical inclusion levels in the wild type and Nrl knockout retina (Fig 1B and S2 Fig). Thus, rods and cones share largely the same splicing program.
We next carried out gene ontology enrichment analysis to determine if alternative splicing in photoreceptors modifies particular processes or cellular components (S4 Table). Several of the enriched categories point to a significant impact of alternative splicing on the cytoskeleton of photoreceptor cells. Apart from the cytoskeleton we see enrichment of genes in broadly defined categories that are partially related to cell differentiation and neurogenesis, demonstrating that alternative splicing modifies multiple systems and processes in the photoreceptor cells.
The photoreceptor splicing program is executed prior to photoreceptor morphogenesis To gain insight into the developmental mechanisms that control splicing in photoreceptors, we analyzed exon inclusion levels in a panel of published retinal RNA-Seq datasets from wild type mice and genetic models that disrupt normal photoreceptor development. In addition to the Aipl1 and Nrl knockouts described above, these models include a Crx knockout [28], a Crxdominant negative (Crx-DN) mutant [29], and the RD10 mutant [30]. Deletion of Crx or expression of the CRX-DN protein block the transcription of the genes involved in phototransduction and the development of the outer segment [28,29,31]. The RD10 mutant, similar to the Aipl1 knockout, loses its photoreceptors in adulthood [30]. The wild type samples included retina from postnatal day 2, which contain early post-mitotic rod photoreceptor progenitors, and fully developed retina from juvenile and adult animals (postnatal days 21 and 50).
Unsupervised hierarchical clustering of the exons differentially spliced in photoreceptors revealed two major clusters (Fig 3A). One cluster is formed by samples derived from the Aipl1 knockout and the RD10 mutant retinas, both devoid of photoreceptor cells, and includes the postnatal day 2 retina samples. A second cluster is formed by the adult wild type retina, the Nrl and Crx knockouts, and the Crx-DN mutant. This clustering of the Crx knockout and Crx-DN mutant, which do not form mature photoreceptors, with the wild type retina shows that the alternative splicing in photoreceptors is controlled independently of the known transcriptional regulators of photoreceptor morphogenesis.
At postnatal day 2 the rod photoreceptors are at the stage of immature progenitors. Interestingly, the splicing profile of the postnatal day 2 retina does not cluster with the samples containing undeveloped photoreceptors that are derived from Crx knockout and Crx-DN retinas at postnatal day 21. The segregation of the postnatal day 2 retina from the juvenile Crx knockout and Crx-DN retina suggests that the photoreceptor splicing program is established in the post-mitotic photoreceptor progenitors prior to morphogenesis of the outer segment. To characterize the temporal control of alternative splicing during photoreceptor differentiation we analyzed by RT-PCR the inclusion levels of four photoreceptor specific exons in the Ttc8, Prom1, Cep290 and Cc2d2a genes between postnatal days 0 and 16 ( Fig 3B). All four exons showed low levels of exon inclusion between postnatal day 0 and postnatal day 2. The inclusion levels of the four exons steadily increase thereafter, reaching half maximum at postnatal day 8, when the photoreceptor outer segments begin to develop. Thus, the shift towards photoreceptor specific isoform expression is initiated in the postmitotic photoreceptor progenitors in advance of the final stages of photoreceptor cell morphogenesis [32][33][34].
Motifs for several RNA binding proteins are enriched in proximity to exons differentially spliced in photoreceptors To determine if a specific subset of splicing regulators bind in proximity to the exons differentially spliced in photoreceptors we performed motif enrichment analysis. For this purpose we used the position weight matrices (PWM) from the Cis-BP-RNA database that describe the sites recognized by RNA binding proteins [35]. As these matrices are derived by aligning 7-mers, they fail to represent the true binding site for certain RNA binding proteins that recognize significantly shorter, 3 to 4 nucleotides long, sequences. To correct this deficiency we substituted the matrices for PTBP, NOVA, MBNL and MSI proteins with matrices corresponding to the sequences recognized by their RNA binding domains, i.e. YCU/UCY for PTBP, YCAY for NOVA, YGCY for MBNL, and UAG for MSI1.
Intronic sequences surrounding the differentially spliced exons showed enrichment of binding sites for several RNA binding proteins compared to the sequences surrounding exons whose inclusion levels were the same in wild type and Aipl1 knockout retina (Fig 4A and S5  Table). We observed enrichment of RBFOX and EIF2S1 binding sites, and marginal, but statistically significant enrichment of Nova binding sites downstream of exons with lower inclusion levels in photoreceptors. The enrichment of binding sites for the cytoplasmic EIF2S1 protein is likely due to similarity of the sequence it recognizes (WGCAUG) to the binding site of the RBFOX splicing factors (UGCAUG). Weak, but statistically significant enrichment of PTBP binding sites was observed upstream of all differentially spliced exons, regardless if they were included at higher rate or skipped at higher rate in photoreceptors compared to inner neurons. Musashi binding sites were enriched downstream of exons with higher inclusion levels in photoreceptors. ELAVL binding sites were partially depleted in exons with higher inclusion levels in photoreceptors and enriched in exons with lower inclusion levels in photoreceptors. Exons that had lower inclusion levels in photoreceptors showed enrichment of binding sites recognized by the KHDRBS, A1CF, LIN28, MEX3 and RBM41 proteins, all of which bind to A/U rich sequences. Binding sites for two SR proteins, SRSF2 and SRSF9, which recognize G/A rich sequences, were depleted in these exons.
Sequence elements that efficiently recruit the PTBP, NOVA, MBNL and MSI proteins typically contain clusters of the short sequences recognized by their RNA binding domains [36][37][38][39][40]. Thus, we tested if a higher number of PTBP, NOVA, MBNL and MSI motifs are found in clusters located in proximity to the exons that are differentially spliced in photoreceptors. PTBP binding sites in clusters containing at least 5 motifs spaced by less than 2 nucleotides were significantly enriched upstream of all differentially spliced exons (Fig 4A, S3 and S4 Figs). Such clustering of PTBP motifs is consistent with the well characterized mode of binding of the PTBP proteins to RNA. MSI binding sites in clusters containing at least three UAG motifs spaced by 10 to 15 nucleotides were enriched up to 8 fold downstream of the exons with elevated inclusion levels in photoreceptors (Fig 4A). Similar enrichment of pairs of MSI binding sites, albeit with larger spacing, was previously reported in the 3' UTRs of transcripts whose stability and translational efficiency is controlled by MSI [38]. NOVA binding sites were also enriched in clusters downstream of exons with lower inclusion levels in photoreceptors compared to inner neurons. Overall, the motif enrichment analysis suggests a potential role in photoreceptors for several neuronal splicing regulators: RBFOX, NOVA, PTBP, KHDRBS and ELAVL. In an attempt to identify splicing factors specific to photoreceptors we examined the expression of 1039 known and potential splicing regulators in the panel of mouse retinal samples used in our splicing analysis (S6 Table). We were unable to identify a gene that is specifically expressed in the samples with high inclusion of photoreceptor-specific exons. We observed that several key regulators of alternative splicing in neurons, Rbfox, Nova and Elavl family members, Ptbp1, Khdrbs2, and Srrm4 are downregulated in the wild type retina compared to the Aipl1 knockout (Table 1 and S1 Table).
We used immunofluorescence staining to characterize the distribution of RBFOX, NOVA, PTBP, KHDRBS, and ELAVL proteins in the mouse retina ( Fig 4B). We were unable to test SRRM4 due to the lack of antibodies suitable for immunofluorescence. In agreement with the RNA-Seq data, RBFOX and NOVA1 proteins were not expressed in the photoreceptor cells`. The RBFOX and NOVA proteins act as splicing activators when bound downstream of alternative exons [37,41]. Thus, the lack of RBFOX and NOVA expression in photoreceptors is consistent with the enrichment of their binding sites downstream of the exons with lower inclusion levels in photoreceptors compared to inner neurons ( Fig 4A).
The PTBP proteins show the expected expression pattern with PTBP1 present in the nuclei of Mueller glia cells, while PTBP2 is expressed in the neurons and photoreceptors (Fig 4B). The absence of PTBP1 from the retinal neurons and photoreceptors releases the splicing of alternative exons carrying PTBP binding sites within the upstream intron. These exons can then be  included at different level depending on the cell type, explaining the enrichment of PTBP1 binding sites upstream of exons that can be either up-or down-regulated in photoreceptors compared to inner neurons. The ELAVL (Hu) proteins are expressed throughout the retina, with lower levels in photoreceptors, consistent with the expression differences determined by RNA-Seq (Fig 4B).
The KHDRBS family of RNA binding proteins includes the ubiquitously expressed KHDRBS1 (Sam68), and two orthologues, KHDRBS2 (Slm1) and KHDRBS3 (SLM2, T-STAR), which in the CNS are expressed in neurons [17,19,42]. Consistent with its ubiquitous expression, KHDRBS1 can be detected thought the retina. In contrast, the KHDRBS2 and KHDRBS3 proteins were expressed only in the neurons of the inner retina, but not in photoreceptors. The KHRDBS3 protein expression is most likely suppressed posttranscriptionally as the KHDRBS3 mRNA levels are uniform throughout the retina (Table 1). Accordingly, the 3'-UTR of KHDRBS3 contains conserved binding sites for microRNAs from the miR-96/miR-182/miR-183 cluster which is expressed in photoreceptors (S5 Fig) [43]. The enrichment of KHDRBS binding sites in the downregulated exons is consistent with at least one previous report showing that KHDRBS3 binds to exonic splicing enhancers to activate exon inclusion [44].
The Musashi proteins directly regulate splicing MSI binding site enrichment in the downstream intron is associated with increased inclusion levels of the alternative exons in photoreceptors. The MS1 and MSI2 proteins are expressed throughout the retina and consistent with previous reports show mostly cytoplasmic localization in the inner neuronal layers (Fig 4A) [45][46][47]. As an adaptation to low light environment, the heterochromatin of mouse rod photoreceptors is packed in the center of the nucleus and the nucleoplasm is pushed to the periphery [48,49]. This morphology makes DNA staining unsuitable for identifying the boundaries of the nucleus. Thus, to determine if the Musashi proteins are present in the nuclei of photoreceptors, where they can regulate splicing, we decorated the nuclear envelope with anti-Lamin antibody (Fig 5A and S6 Fig) To test if the Musashi proteins can promote inclusion of an alternative exon when bound downstream of it we used a splicing reporter that has two lambda phage BoxB RNA hairpins engineered downstream of an artificial alternative exon [50]. The loop of the BoxB hairpin is specifically bound by the lambda N-peptide. Consequently, proteins tagged with the lambda N-peptide are tethered to the BoxB elements on the reporter pre-mRNA. Cotransfection of the reporter with Musashi lambda-N fusions increased inclusion of the reporter exon (Fig 6  and S7A Fig). The effect of Musashi on splicing is completely abolished in a reporter containing G to A point mutations in the two BoxB elements that disrupt binding of the lambda-N peptide. To test if the effect of MSI1-lamda-N fusion is specific we cotransfected the BoxB reporter with lambda-N fusions for three well characterized splicing factors, PTBP1, SRSF4 and TRA2B. None of these splicing factors increased the inclusion level of the test exon when tethered downstream of it, demonstrating that this is an effect specific to the MSI1 protein (S7C Fig).

The Musashi proteins promote Ttc8 exon 2A inclusion in photoreceptors
To determine if the Musashi proteins regulate splicing in photoreceptors we turned to the photoreceptor specific exon 2A in the Ttc8 (Bbs8) gene. We previously mapped two 100nt sequence segments in the introns immediately adjacent to exon 2A that act in concert to promote the splicing of this exon in photoreceptors [23]. Deletion mutagenesis showed that these segments contain multiple redundant cis-acting sequences [23]. The D4 segment located immediately downstream of exon 2A carries two clusters of Musashi binding sites, each containing three UAG motifs (Fig 7A). Within 320nt of the downstream intron immediately adjacent to exon 2A we find two more clusters containing three and four UAG motifs, respectively. In contrast, the 350nt section of the intron immediately upstream of exon 2A contains four Musashi binding sites, approximately the number of UAG triples that would be expected in random sequence of this size. To determine if the Musashi proteins can bind specifically to the D4 segment we used biotinylated RNA corresponding to this element to pull-down RNA binding proteins from retinal extracts. We also performed the pull-down with the other regulatory element, D3, and with segment D2, which is not required for splicing of exon 2A in photoreceptors. The binding was competed with non-biotinylated RNA, either of the same sequence or different sequence of the same length. In this pulldown experiment the Musashi proteins bind specifically to segment D4 (Fig 7B). In contrast, segments D2 and D3, each of which contains a single UAG motif, had low affinity for the Musashi proteins and the binding was completely blocked by competitor RNA.
To determine how MSI1 binding downstream of exon 2A affects its inclusion levels in the retina we used a reporter minigene designed to produce GFP when the exon is skipped and RFP when the exon is included [23,51]. We mutated all 15 Musashi binding sites in the downstream intron of the minigene (Fig 7A). Both the wild type and mutant minigenes were cotransfected with MSI1 expression construct in N2A cells. MSI1 promoted the inclusion of the wild type exon 2A but had no effect on the mutant minigene (Fig 7C and S7A Fig). To determine if the MSI binding sites are required for splicing of Ttc8 exon 2A in photoreceptor cells, we electroporated the wild type and mutant minigenes in the retina of neonate mice. We allowed the photoreceptors to develop and analyzed the splicing of the minigene transcripts in the retina by RT-PCR and immunofluorescence at postnatal days 16 and 20, respectively. As we have shown previously, the wild type Ttc8 exon 2A is included at high levels (98%) in the photoreceptors and is excluded from the mature transcripts in the inner neurons ( Fig 7D and  S7B Fig) [23]. Consistent with a role for MSI1 in directing splicing in photoreceptors, the inclusion level of exon 2A in the transcripts of the mutant minigene was reduced to approximately 38%.

MSI1 promotes the inclusion of photoreceptor specific exons
The enrichment of Musashi binding sites downstream of exons with elevated inclusion levels in photoreceptors suggests that multiple alternative exons should be regulated by the Musashi proteins in addition to Ttc8 exon 2A. To test this prediction, we expressed flag-tagged MSI1 in N2A cells and analyzed by RT-PCR the splicing of eleven endogenous transcripts containing exons with elevated inclusion levels in photoreceptors. MSI1 caused statistically significant increase in the inclusion levels of seven of the eleven exons (Fig 8). Inclusion levels of five of these exons, including Ttc8 exon 2A of the endogenous Ttc8 gene, increased by at least 10% in response to MSI1 expression (Fig 8). The smaller amplitude of the effect of Msi1 transfection on the inclusion levels of the endogenous Ttc8 exon 2A compared to the minigene transcripts is likely due to the transfection efficiency, which was approximately 40% in these experiments. Among the exons coordinately regulated by the MSI1 are four "switch-like" exons in Ttc8, Cep290, Cc2d2a and Prom1. All four genes encode ubiquitously expressed proteins that are involved in ciliary biogenesis and function. Ttc8, Cep290, Cc2d2a and Prom1 are also required for the development and maintenance of the photoreceptor outer segments [52][53][54]. The minigene is designed to produce GFP when the alternative exon is skipped or RFP when the exon is included. The wild type minigene is shown on the right. High RFP and low GFP expression in the photoreceptors indicates that the exon is included in the mature transcripts from the minigene. The inner neurons, marked with yellow arrows, express almost exclusively GFP, an indication that the exon is skipped. Reduced RFP expression and increased GFP levels in the mutant minigene indicate that the exon is mostly skipped in photoreceptors, in agreement with the RT-PCR analysis.

Discussion
The lack of comprehensive gene expression profiles of defined neuronal subtypes is a major obstacle to understanding how the neuronal diversity of the vertebrate CNS is established. To delineate the gene expression and alternative splicing programs of a single neuronal subtype we turned to the vertebrate retina. The high abundance of rod photoreceptors in the mouse retina allowed us to isolate the characteristic features of their transcriptome by comparing the Aipl1 knockout model of retinal degeneration to wild type mice.
The deduced splicing profile of photoreceptor cells is related to the spicing profiles of retinal and CNS neurons (Fig 2 and S1 Fig). Alternative splicing in neurons is known to be regulated by SRRM4 and members of the PTBP, RBFOX, KHDRBS, NOVA and ELAVL families of RNA binding proteins [14,[16][17][18][19]. Strikingly, photoreceptors do not express RBFOX, NOVA, KHDRBS2 and KHDRBS3 proteins, express lower amounts of ELAVL proteins and have markedly lower Srrm4 transcript levels. In light of these data, the switch from PTBP1 to PTBP2 expression during development emerges as a major determinant of the alternative splicing program that is shared between neurons and photoreceptor cells [55,56].
In the absence of a "master" regulator of splicing specific to photoreceptors, the characteristic splicing program of these cells is likely determined by a unique combination of splicing factors with broader expression.
We show that Musashi 1 (MSI1) promotes the splicing of exons with elevated inclusion levels in photoreceptors. The Musashi proteins are notable for their expression in stem cells, where they are involved in stem cell maintenance and cell fate determination [57][58][59]. The best characterized function of Musashi is the regulation of mRNA stability and translation through binding to the 3'-UTR of the target transcripts [58]. In the retina, the subcellular localization of the Musashi proteins varies during development and in mature neurons the two proteins are confined to the cytoplasm (Fig 4) [45]. Here we show that MSI1 is present not only in the cytoplasm but is also abundant in the nuclei of photoreceptor cells, where it controls alternative pre-mRNA splicing. Our findings contrast recent studies of the Musashi activity in the central nervous system and glioblastoma, where the cytoplasmic localization of the Musashi proteins confines their function to control of mRNA stability and translation and limits their impact on splicing [60,61]. Control of the subcellular localization of MSI1 provides a mechanism that can produce the characteristic splicing program of photoreceptors in the absence of a photoreceptor-specific RNA binding protein. Similar mechanisms that involve redistribution of RNA binding proteins between the nucleus and the cytoplasm are known to control alternative splicing in response to external stimuli [62][63][64]. While playing a significant role in promoting inclusion of photoreceptor specific exons, MSI1 is clearly not the sole determinant of the photoreceptor specific splicing program. The absence from photoreceptors of the RBFOX, NOVA and KHDRBS2/3 proteins also contributes to the differences in alternative splicing between photoreceptors and neurons as indicated by our motif enrichment analysis. Our results suggest that the characteristic splicing program of the photoreceptor cells may be determined by a unique suite of splicing factors with broader expression that can combinatorically form different exon recognition complexes depending on the sequence of the underlying RNA substrate. Such combinatorial control of splicing is a well-established paradigm and was recently demonstrated to control "switch-like" splicing events during reprograming of primary fibroblasts into pluripotent stem cells [65,66]. Further research will be needed to directly demonstrate the combinatorial control of the inclusion of photoreceptor specific exons and identify the factors beyond MSI1 that are involved in this process.
At present it is unclear how most of the alternative exons we identified affect protein function or to what degree alternative splicing shapes the properties of the photoreceptor cells. One exception is the 14nt exon 8 in the Arl6 (BBS3) gene, which has high inclusion levels in photoreceptors (Fig 1). ARL6, a Ras family GTP-binding protein, is part of a network of proteins involved in the development and maintenance of primary cilia (Fig 8C). The exon 8 containing isoform of ARL6 is required for normal vision in zebra fish [20,21]. The vision phenotype of mice lacking Arl6 exon 8 has not been reported in detail, however gross histological examination and immunofluorescent staining of the retina show that the inner segments of photoreceptors are disorganized [21]. Several other components of the protein network that ARL6 is part of are also differentially spliced in photoreceptors. Four of these genes, Cep290, Cc2d2a, Ttc8 and Prom1 contain "switch-like" exons that produce isoforms highly specific to photoreceptors (Figs 8C and 1B) [22,23]. Strikingly, the splicing of these "switch-like" exons is coordinately regulated in development ( Fig 3B) and their inclusion is promoted by MSI1.
The photoreceptor splicing program is activated in the postmitotic progenitors, prior to the onset of outer segment development. A transcription factor cascade starting from Crx homeobox protein that is critical for photoreceptor morphogenesis is also activated during the same developmental time frame [31,67,68]. Interestingly, alternative splicing in photoreceptors is not affected in the Crx knockout animals and in the Crx dominant negative mutant [29]. Thus, the developmental switch to photoreceptor specific splicing is independent of the established transcriptional mechanism that activates the expression of photoreceptor specific genes.
In summary, we demonstrate that photoreceptors express a characteristic splicing program that encompasses hundreds of alternative exons and affects the transcripts of multiple genes that are critical for vision.

Ethics statement
All procedures carried out on laboratory mice are in full compliance with all federal regulations and were approved by Institutional Animal Care and Use Committee at West Virginia University.

Clones and antibodies
Musashi 1 cDNA was amplified from mouse retinal cDNA using primers that introduced the flag-tag and cloned into pCDNA3.1 (See S10 Table for primer sequences). The PKC-neg-40B-2xBoxB-EGFP splicing reporter and pIBX-C-FF-(B)-NLS-λN expression vector were described previously [50,69]. The Ttc8 exon 2A minigene containing mutant Musashi 1 consensus binding motifs was created using Gibson Assembly (See S10 Table for the oligonucleotide sequences). Antibodies used in this work are listed in S8 Table. Mice All procedures carried out on laboratory mice were approved by Institutional Animal Care and Use Committee at West Virginia University (WVU). Subretinal injection, time course analyses, and immunofluorescence of sections were performed on CD-1 mice (Charles River). Subretinal injection and electroporation of DNA was carried out on newborn CD-1 pups as described previously [70]. Toluidine blue staining was performed by Excalibur Pathology Inc. on retina sections from p65 C57bl/6j and p60 C57bl/6j:Aipl1(-/-) mice.

RNA-Seq library preparation and sequencing
Total RNA was isolated from wild type C57bl/6j and C57bl/6j:Aipl1 (-/-) retinas at postnatal day 50 using Tri-reagent (Sigma). rRNA subtracted RNA-Seq libraries were generated using 1μg of total RNA per replicate using RiboZero and TruSeq kits (Illumina). Four replicates, each derived from different animal, were generated for each wild type and Aipl1(-/-) sample. The libraries were sequenced to a depth of 43 million reads (range 39 to 47 million reads) on Illumina Hi-Seq 15000. The reads produced by the RNA-Seq experiments are deposited at the NCBI SRA repository under accession number SRP068974.

Bioinformatics analysis
Reads from the retinal samples were mapped to the current mouse genome (GRCm38) using TopHat. Following the mapping, Cufflinks was used to carry out guided transcriptome assembly based on the ENSEMBL GRCm38 annotation (S1 Data file contains the updated annotation in GTF format). Additional RNA-Seq data sets for mouse tissues and retinal samples from genetically engineered mouse models (Nrl knockout, Crx knockout, Crx dominant negative and RD10 mutant) were downloaded from the NCBI sequence read archive and aligned using the updated annotation. The accession numbers of the data sets not generated by us are listed in S8 Table. Exon inclusion levels across all samples were calculated using rMATS version 3.08 [71]. We added to rMATS a basic capability to discover novel exons within annotated transcripts based on splice junction reads that are anchored on one end to a known exon (S8 Fig). rMATS was used to carry out differential splicing analysis of the wild type and Aipl1(-/-) retina samples. Differences in gene expression between the wild type and Aipl1(-/-) samples were identified using featureCounts and edgeR [72,73]. Gene Ontology analysis was carried out using Web-Gestalt [74].
Motif enrichment analysis was carried out in R/Bioconductor using the PWMEnriched package [75]. Position weight matrices for RNA binding proteins were described previously [35]. The matrices for RBFOX, PTBP, MBNL and MSI proteins were replaced with the matrices listed in S9 Table. Binding sites carrying at least 90% match to the scoring matrices were counted in the exons and in 200nt segments of the introns immediately adjacent to the exon. Binding sites for orthologues recognizing highly similar sequences, e.g. RBFOX1, 2 and 3 proteins, KHDRBS1, 2 and 3 proteins, etc. were pooled together. Binding that overlap by more than 50% were counted as a single site. Two single tailed hypergeometric tests were used to determine the significance of the binding site enrichment/depletion in each segment. The hypergeometric test p-values were corrected for multiple testing using Benjamini-Hohberg's procedure. To assess if there is an enrichment of clustered binding sites, the analysis for MSI, PTBP, ELAVL and NOVA was repeated after excluding the binding sites not located within a cluster. Clusters were defined by two parameters: minimum number of binding sites necessary to form a cluster, ranging from 2 to 5; and the maximum spacing between them, ranging from 0nt to 30nt. The enrichment analysis was carried out for each pair of minimum binding site count and maximum spacing parameters.

RNA isolation and RT-PCR from retina
RNA from post-natal day 16 retinas was isolated with TRI reagent (Sigma) according to manufacturer's guidelines and reverse transcribed using mixture of random hexamers and oligo-dT to prime the cDNA synthesis. Alternatively spliced regions were amplified using fluorescently labeled primers positioned in the flanking exons (See S10 Table for primer sequences). The amplified products were separated by gel electrophoresis under denaturing conditions and imaged on a Typhoon 9410 imager (GE).

Retinal tissue sections and fluorescence imaging
Retinal sections were prepared, stained and imaged as described previously [23]. Musashi, Lamin-B, and DAPI co-localization analysis was performed on 4μm sections using ImageJ software to plot signal intensities spanning a 10μm line perpendicular to the border of nuclei in the ONL, n = 53. Signals from individual nuclei were normalized to the maximal signal for each channel and each set of measurements were centered relative to the maximal Lamin-B signal before averaging and plotting data.

Cell line transfections and RNA isolation
Transient transfection of N2A cells were carried out using polyethyleneimine. After 48 hours total RNA was isolated with TRI reagent (Sigma) according to manufacturer's guidelines. To isolate total protein for western blot the cells were lysed in SDS sample buffer.
293T cells were transfected using Mirus Transit 293 reagent with 150ng PKC-neg-40B-2xBoxB-EGFP or PKC-neg-40B-2xBoxB(G1A)-EGFP and 50ng pIBX-C-FF-(B)-NLS-λN (empty vector, Msi1 or Msi2) per well of a 24 well plate in triplicate. RNA and protein were isolated 48 hours later using Trizol reagent and RIPA buffer, respectively. RT-PCR of the minigene was carried out using primers in the flanking exons of the 40nt test exon with the reverse primer being FAM labeled.

Western blotting
The protein samples were resolved in 10% SDS-PAGE gel electrophoresis before being transferred to an Immobilon FL membrane (Millipore.) Membranes were blocked in Tris buffered saline solution containing 1% Tween-20 and 0.25% bovine skin gelatin. The membranes were incubated with primary antibodies overnight at in the blocking solution. After removing the primary antibody and washing the membrane in the blocking solution, the secondary antibodies were applied in the blocking solution for 1 hour. Membranes were washed again in the blocking solution and imaged on a Typhoon 9410 imager (GE) after washing with PBS.

RNA pull-down
RNA probes were synthesized with the Hi-Scribe T7 RNA Synthesis kit (NEB) using 0.5μg of PCR amplified template DNA (See S10 Table for primer sequences). 100pmol of RNA probes were then biotinylated using the Pierce RNA 3' end biotinylation kit (Thermo Fisher) and purified according to manufacturer's instructions. Biotinylated RNA probes were re-suspended in 100μl of high salt buffer (0.5M NaCl, 10mM Hepes pH7.9). Approximately 0.4mg of streptavidin magnetic beads (NEB) were washed in high salt buffer and incubated with biotinylated probes on ice for 1-2 hours with occasional mixing. The beads were then washed three times with wash buffer (0.1M KCl, 10mMHepes pH 7.9, 0.1% Triton-X100). Washed beads were incubated on ice with 100μg retinal extract and 6μg competitor RNA in binding buffer (0.1M KCl, 10mMHepes pH 7.9, 5μg/μl heparin, 0.1% Triton-X100 and 20U RNAse Inhibitor) for four hours with occasional mixing. Beads were then washed three times with wash buffer and the bound proteins were eluted in wash buffer containing 20ng RNAse A. The Musashi proteins in the eluates were detected by western blotting using an antibody that reacts with both MSI1 and MSI2 (S8 Table).

Flow sorting of rod photoreceptor cells
Label-free isolation of rod photoreceptor cells by flow cytometry was carried out as described before (S2A Fig) [79]. The identity of the sorted cells was confirmed by quantitative RT-PCR using markers for rod photoreceptors, different inner neuron types and glial cells (S2B Fig) [80]. The expression levels were normalized to the geometric average of four reference genes: Gapdh, Hprt, Sdha and Pgk1. The primers used in the qPCR assays are listed in S10 Table. Supporting Information Immunofluorescence detection of MSI2 in the outer nuclear layer of wild type mouse retina (6μm sections). The nuclear envelope is stained with Lamin-B antibody (red). MSI2 staining is shown in gray. The nuclear DNA is stained with DAPI (blue). (B and C) Immunofluorescence detection of MSI1 and MSI2, respectively, in the outer nuclear layer of NRL(-/-) mouse retina (6μm sections). The nuclear envelope is stained with Lamin-B antibody (red). Musashi protein staining is shown in gray. The nuclear DNA is stained with DAPI (blue). The Musashi proteins were visualized on the same section using rat -anti-MSI1 and rabbit anti-MSI2 antibodies in combination with anti-rat AF647 and anti-rabbit AF655 secondary antibodies. Junction reads that map on one end of an exon in annotated transcripts are used to identify novel exons. A novel exon is defined by two sets of junction reads of at least 10 reads per set, one anchored on the left and a second one anchored on the right to a known exon, that map within a predefined distance (300nt) from each other. (TIF) S1