Zebrafish rbm8a and magoh mutants reveal EJC developmental functions and new 3′UTR intron-containing NMD targets

Many post-transcriptional mechanisms operate via mRNA 3′UTRs to regulate protein expression, and such controls are crucial for development. We show that homozygous mutations in two zebrafish exon junction complex (EJC) core genes rbm8a and magoh leads to muscle disorganization, neural cell death, and motor neuron outgrowth defects, as well as dysregulation of mRNAs subjected to nonsense-mediated mRNA decay (NMD) due to translation termination ≥ 50 nts upstream of the last exon-exon junction. Intriguingly, we find that EJC-dependent NMD also regulates a subset of transcripts that contain 3′UTR introns (3′UI) < 50 nts downstream of a stop codon. Some transcripts containing such stop codon-proximal 3′UI are also NMD-sensitive in cultured human cells and mouse embryonic stem cells. We identify 167 genes that contain a conserved proximal 3′UI in zebrafish, mouse and humans. foxo3b is one such proximal 3′UI-containing gene that is upregulated in zebrafish EJC mutant embryos, at both mRNA and protein levels, and loss of foxo3b function in EJC mutant embryos significantly rescues motor axon growth defects. These data are consistent with EJC-dependent NMD regulating foxo3b mRNA to control protein expression during zebrafish development. Our work shows that the EJC is critical for normal zebrafish development and suggests that proximal 3′UIs may serve gene regulatory function in vertebrates.


Introduction
Post-transcriptional control of messenger RNA (mRNA) expression is critical to regulate location, amount, and duration of protein expression. To achieve optimal protein expression in eukaryotes, many regulatory signals reside in mRNA 3 0 -untranslated regions (3 0 UTRs) [1]. Recognition of 3 0 UTR-embedded signals by RNA binding proteins and miRNAs alter the 3 0 UTR ribonucleoprotein (RNP) composition and regulate mRNA localization, translation, and stability [1][2][3]. Nuclear RNA processing steps such as alternative polyadenylation can further impact 3 0 UTR RNP composition by altering 3 0 UTR length, and hence the repertoire of 3 0 UTR regulatory signals [4,5]. Mechanisms dictating 3 0 UTR RNP composition are thus important for cellular function and organismal development [1,6,7].
Pre-mRNA splicing also greatly impacts RNP composition by imprinting several proteins including the exon junction complex (EJC) on spliced exons [8][9][10]. The EJC is comprised of three core proteins, Eif4a3, Rbm8a (Y14), and Magoh, which assemble~24 nt upstream of exon-exon junctions and regulate many post-transcriptional steps including pre-mRNA splicing, mRNA export, localization, translation and nonsense-mediated mRNA decay (NMD) [11][12][13]. As introns primarily occur in open reading frames and rarely in 3 0 UTRs, the EJC mainly decorates the translated portion of mRNAs [14], from where they are removed by the first translating ribosome [15,16]. However, if a ribosome terminates translation � 50 nucleotides (nts) upstream of an exon-exon junction, one or more EJCs that remain on the mRNA are now located within the 3 0 UTR. Such EJCs that occur downstream of a terminated ribosome can engage components of the NMD pathway leading to activation of the central NMD factor UPF1 and rapid mRNA turnover [17,18]. In this way, the EJC can induce destruction of aberrant transcripts bearing premature termination codons to suppress expression of truncated polypeptides. When combined with regulated alternative splicing, such EJC-induced NMD can also suppress expression of particular protein isoforms to regulate cellular homeostasis and developmental decisions [19][20][21]. Normal mRNAs that contain features such as 3 0 UTR introns (3 0 UIs) can also acquire EJCs located in the 3 0 UTRs (and hence within the 3 0 UTR extracts. We find that both Eif4a3 and Magoh, but not a negative control RNA-binding protein HuC, specifically co-immunopreciptate with Rbm8a ( Fig 1A). We and others have previously shown that the EJC primarily binds 24 nts upstream of exon-exon junctions in cultured human cells and adult Drosophila [42][43][44][45][46]. To test if the EJC binds at a similar position on zebrafish spliced RNAs, we first optimized RNA-immunoprecipitation (RIP) from RNase-treated zebrafish embryo lysates using an Rbm8a antibody (S1B and S1C Fig). Using optimized conditions, we obtained three well-correlated Rbm8a RIP-Seq biological replicates of Rbm8a-associated RNA fragments from zebrafish embryo lysates (S1D Fig, S1 Table). Rbm8a footprint read densities are significantly higher in exonic regions as compared to intronic regions (Fig 1B), and on exons from multi-exon genes as compared to those from intron-less genes (Fig 1C). A meta-exon analysis shows that, like the human EJC, zebrafish Rbm8a footprints cluster around the canonical EJC position 24 nts upstream of exon 3 0 ends ( Fig 1D). As expected, 5 0 and 3 0 ends of RIP-Seq reads accumulate upstream and downstream of the -24 nt position, respectively ( Fig 1E). A dramatic reduction in 5 0 and 3 0 end read counts is seen in a~10 nt region around the -24 position, revealing the RNA segment that is protected from RNase digestion by Rbm8a-containing EJCs ( Fig 1E). The predominant Rbm8a-occupancy position close to exonic 3 0 ends is also evident from the RIP-Seq read distribution on individual exons ( Fig 1F  and S1E Fig). Qualitatively, many canonical EJC sites from highly expressed multi-exon genes show variable Rbm8a binding ( Fig 1F and S1E Fig). These profiles also show that zebrafish Rbm8a also associates with non-canonical positions away from the -24 position (Fig 1C, Fig  1F and S1E Fig), as observed previously in human cells [42,43]. Thus, like in human cells, the EJC in zebrafish embryos is also detected at non-canonical positions. Taken together, we conclude that pre-mRNA splicing shapes zebrafish mRNP composition through EJC deposition at exon-exon junctions and beyond.

rbm8a and magoh mutant embryos show defects in motility, muscle organization, and motor axon outgrowth
To identify the molecular functions of the EJC during embryonic development, we generated zebrafish rbm8a and magoh mutant embryos. Using a CRISPR/Cas9-based approach [47], we created frame-shifting deletions early in the protein coding sequence to generate null alleles (Fig 2A). Fish heterozygous for rbm8a oz36 or magoh oz37 alleles displayed no obvious phenotypes and were fully viable. Homozygous mutant rbm8a or magoh embryos (hereafter collectively referred to as EJC mutant embryos), obtained by intercrossing rbm8a or magoh heterozygotes, initially appear morphologically normal except for head necrosis and tail curvature prior to 24 hpf (hours post fertilization) (Fig 2B). A closer examination revealed that head necrosis is readily detected by acridine orange staining at 19  We hypothesized that EJC mutant embryos are initially sustained by maternally-deposited rbm8a and magoh transcripts [48] and Rbm8a and Magoh protein, and that developmental defects appearing at 19-21 hpf coincide with declining maternal stores. Consistent with maternal deposition of EJC transcript and/or protein, we detect Rbm8a and Magoh protein in 2-4 cell stage embryos (0.75 hpf) (Fig 2C and 2D). By 21 hpf, both Rbm8a and Magoh levels in EJC mutant embryos decrease to~25% of their respective levels in wild-type siblings, and levels continue to drop over the next six hours (Fig 2C and 2D). As previously observed in  mammalian cells [49], reduction of either protein of the Rbm8a:Magoh heterodimer leads to a concomitant depletion of the other protein (Fig 2C and 2D).
Although EJC mutant embryos are morphologically indistinguishable from wild-type siblings at 18 hpf, we find that they are paralyzed (Fig 3A). It is unlikely that the lack of spontaneous contractions is due to developmental delay as EJC mutant embryos progressively worsen and thus, never become motile (S2D Fig). To further characterize the paralysis phenotype, we assessed muscle and motor neuron morphology, as these cell types are required for motility. Myosin heavy chain immunostaining reveals that EJC mutant embryos have disorganized myofibers and have U-shaped instead of chevron-shaped myotomes (Fig 3B-3D), with muscle defects in magoh mutant embryos consistently more severe than in rbm8a mutant embryos. Co-labeling of motor axons (using anti-SV2) and neuromuscular junctions (using Alexa Fluor-conjugated α-Bungarotoxin), shows that motor axon length and neuromuscular junction number are reduced in EJC mutant embryos (Fig 3E-3H). Thus, as expected of genes that encode proteins that function as a complex, homozygous rbm8a and magoh mutant embryos show phenotypically similar muscle organization and motor axon outgrowth defects.

Analysis of gene expression in rbm8a and magoh mutant embryos
To identify EJC-regulated genes during zebrafish embryonic development, we performed RNA-Seq from the EJC mutant embryos and their wild-type siblings at two developmental time points, 21 hpf and 27 hpf. The 21 hpf timepoint is when EJC mutant embryos begin to show visible phenotypes and reduced Rbm8a and Magoh protein levels ( Fig 2C and 2D), but do not yet display extensive cell death. By 27 hpf, EJC mutant embryos have reliably low Rbm8a and Magoh protein levels as well as motor axon defects. However, because magoh mutant embryos display extensive necrosis at 27 hpf (S2C Fig), we only focused on RNA-Seq from the less necrotic rbm8a mutant embryos at this later time point. We generated three biological replicates of total RNA-Seq from each mutant (rbm8a mutant at 21 hpf and 27 hpf, and magoh mutant at 21 hpf, S1 Table) and their wild-type siblings, a mixture of wild-type and heterozygous embryos. Although the latter two genotypic classes may have differences in their gene expression profiles, we have combined them since heterozygous animals are phenotypically indistinguishable from homozygous wild-type siblings and are fully viable and fertile. A differential gene expression analysis using DESeq2 identified gene-level expression changes in the two mutant embryos compared to their wild-type siblings. As expected, rbm8a and magoh transcripts are downregulated in the respective mutant embryos (Fig 4A-4C). We compared genes that are significantly altered (fold-change > 1.5, false discovery rate (FDR) < 0.05) among the different mutant embryos and time points. A significant number of genes are upor down-regulated (Fig 4D, 103 upregulated and 29 downregulated) in both 21 hpf rbm8a and magoh mutant embryos. Similarly, a significant number of genes are upregulated between rbm8a mutant embryos at 21 and 27 hpf (Fig 4E). The modest overlaps observed between differentially expressed genes in EJC mutant embryos could be due to differences in developmental timing or due to variable degree of protein depletion (see discussion).
We next determined if genes with shared functions are enriched among differentiallyexpressed genes in EJC mutant embryos. Except for a handful of cell death regulators and effectors, none of the upregulated genes in either EJC mutant identify a functionally-related class of genes, suggesting that the proteins they encode perform a variety of functions. The upregulated cell death genes include tp53, tp53-inp1, and casp8 (the latter upregulated only in rbm8a mutant embryos), which is consistent with cell death observed in mutant embryos ( Fig  2B and S2 Fig). In contrast to upregulated genes, downregulated genes in each EJC mutant are enriched in specific GO terms. In rbm8a mutant embryos, downregulated genes at both 21 and 27 hpf are significantly enriched for genes encoding proteins with G-protein coupled receptor (GPCR) or nucleic acid binding activities ( Fig 4F). In magoh mutant embryos at 21 hpf, downregulated genes are significantly enriched for the retinoid binding GO term, which includes several GPCRs. Another functionally-related group of genes downregulated in magoh mutant embryos at 21 hpf are genes encoding structural constituents of the ribosome (Fig 4F). This latter class is also downregulated in mouse magoh heterozygotes [31], highlighting the importance of magoh in ribosomal gene expression during development. Finally, we observe that genes encoding muscle-specific myosins (e.g. mylpfb, myl10, myhz2) and several neuralspecific genes (e.g. rgs17, st8sia5, lnx2b, camk4) are downregulated in EJC mutants, which may result from altered development and/or loss/reduction of muscle and neuronal cell types (Fig 2  and S2 Fig).
Despite reliable quantification of gene-level differences in expression in the two mutants, surprisingly, we observed only a few changes in splicing patterns in rbm8a and magoh mutants using the DEX-Seq approach [50], with no overlapping changes between the mutants to report. Possibly, short read lengths (median length~35 bp) in our RNA-seq data precludes robust quantification of exon-junctions in mutant versus wild-type animals. Therefore, we could not   [86] gene ontology (GO) term overrepresentation analysis of genes downregulated in rbm8a and magoh mutant embryos at indicated times. All significant terms (Benjamini-Hochberg corrected p-value < 0.05) are shown for each set. The number of genes in each term is indicated at the right of each bar. assess whether the EJC functions during pre-mRNA splicing in zebrafish embryos, a role conserved in human, mouse and flies [31,[51][52][53].

rbm8a and magoh mutant embryos have defects in NMD
Because translation termination upstream of exon-exon junctions was previously shown to trigger NMD in zebrafish embryos [54], one expected group of upregulated transcripts in EJC mutant embryos are mRNAs containing premature termination codons (PTC) or "natural" NMD targets containing a 3 0 UTR intron (3 0 UI) or an upstream open reading frame (uORF). To identify whether NMD targets are enriched among upregulated genes in EJC mutant embryos, we first compared genes upregulated in EJC mutant embryos to those upregulated in zebrafish upf1 morphants at 24 hpf [55]. A statistically-significant number of genes are shared between 24 hpf upf1 morphants and magoh mutant embryos at 21 hpf (39 out of 707, pvalue = 1.3 x 10 −2 ), and rbm8a mutant embryos at 21 hpf (45 out of 1103, p-value < 10 −4 ) and at 27 hpf (44 out of 499, p-value = 2.1 x 10 −7 ) (S3A Fig). Importantly, NMD targets previously identified in zebrafish upf1 morphants (e.g. isg15, atxn1b, bbc3) [55] and mRNAs predicted to undergo NMD (e.g. upb1, contains a 3 0 UI) are among these shared genes. We also generated an independent dataset of Upf1-regulated transcripts from zebrafish morphants at an earlier timepoint (12 hpf) using RNA-Seq (in duplicate, S3B Fig, S1 Table) to avoid secondary targets upregulated due to extensive cell death in upf1 morphants [54]. We find that upregulated genes in 12 hpf upf1 morphants show a modest but significant overlap with upregulated genes in the previously published 24 hpf upf1 morphant dataset (S3C Fig); the overlap also includes three of the five NMD targets previously identified by Longman et al. (isg15, atxn1b and bbc3) [55]. Statistically significant overlap is also observed among upregulated genes in 12 hpf upf1 morphants and 21 hpf magoh mutant embryos (32 out of 707, p-value = 1.6 x 10 −9 , Fig Table); overlap with 21 hpf rbm8a mutant embryos is smaller and insignificant. Because the observed changes in rbm8a mutant embryos at 21 hpf were similar to but more modest than in magoh and rbm8a mutant embryos at 21 hpf and 27 hpf, respectively, we focused all subsequent analyses on the latter two EJC mutant datasets. At least one-third of all genes upregulated >1.5 fold upon upf1 knockdown (FDR < 0.05) also show a >1.5-fold increase (FDR < 0.05) in either 21 hpf magoh or 27 hpf rbm8a mutant embryos with 14 genes being significantly upregulated in all three datasets ( Fig 5A, S2 Table). Globally, genes upregulated >1.5 fold in EJC mutant embryos (FDR < 0.05), as compared to unchanged genes, also show a positive fold-change in upf1 morphants at both 12 hpf and 24 hpf (Fig 5B and 5C; S3D-S3G Fig). This observation suggests that a much larger shared set of genes show an increase in abundance upon depletion of the EJC or Upf1 even though only a small set is significantly affected.
To independently validate that predicted EJC-dependent NMD targets are indeed affected in EJC mutant embryos, we quantified relative levels of select transcripts that are orthologous to previously validated human NMD targets (eif4a2, srsf3a, srsf7a, and gadd45aa), or are upregulated in zebrafish upf1 morphants [55] (e.g. gtpbp1l, atxn1b). All of these transcripts are robustly upregulated in at least one of the EJC mutant backgrounds compared to wild-type siblings, and some (eif4a2, gadd45aa, gtpbp1l, atxn1b) are upregulated in both EJC mutant backgrounds ( Fig 5D). These data further confirm that the EJC is required for Upf1-mediated downregulation of NMD targets in zebrafish embryos.
We next tested if known classes of NMD targets (e.g. PTC-, 3 0 UI-, uORF-containing mRNAs) are upregulated in EJC mutant embryos. In the Ensembl database, transcripts that contain at least one exon-exon junction > 50 nts downstream of stop codon are flagged as 'NMD biotype', and these PTC-containing mRNAs are expected to undergo EJC-dependent  Fig 7D) is plotted as an increasing step function along the y-axis with a jump of 1/N at each value of gene number equal to an observed value of fold change (mutant/ WT) (x-axis). Thus, the group of genes that show a rightward shift along the x-axis when compared to control group of genes is a measure of their upregulation in mutants compared to WT siblings (x-axis label). Kolmogorov-Smirnov (KS) test p-value for differences in fold changes between the two groups is indicated on the bottom right. C. CDF plot as in B for genes upregulated in rbm8a mutant embryos at 27 hpf (red) compared to unchanged genes (black). D. Quantitative RT-PCR (qRT-PCR) analysis showing fold change of select NMD target transcripts (x-axis) compared to control (mob4) transcript in magoh mutant embryos at 21 hpf compared to wild-type siblings (dark gray bars) and in rbm8a mutant embryos at 27 hpf compared to wildtype siblings (light gray bars). The selected genes either contain a 3 0 UTR intron (eif4a2, srsf3a and srsf7a) and/or have orthologs that are known NMD NMD. Of the genes encoding transcripts annotated as NMD biotype, 566 genes are detected in our datasets and are upregulated as a group in EJC mutant embryos as compared to a control group of intron-less protein-coding genes (Fig 5E and 5F). We next evaluated features known to induce EJC-dependent NMD of transcripts encoding full length proteins (e.g. 3 0 UI, uORFs). We limited our analysis to transcripts that are well-supported to encode functional proteins as per the APPRIS database [56]. This genome annotation resource provides manually-curated transcript annotations for zebrafish (and other organisms) where transcripts are classified as principal or alternative isoforms based on conservation, structure and function of each transcript and their encoded proteins. We identified 582 genes that encode APPRISannotated transcript isoforms with 3 0 UIs > 50 nts from stop codons (14 of these are also labeled as 'NMD biotype' in Ensembl). Of these, 532 genes that are detected in our datasets are upregulated as a group in EJC mutant embryos when compared to intron-less protein-coding genes ( Fig 5E and 5F). Consistent with direct regulation of 3 0 UI-containing transcripts via EJC-dependent NMD, a greater number of APPRIS-supported transcripts containing this feature show a positive fold change in the two rbm8a mutant datasets (S3H Fig). uORFs are another feature that subject mRNAs to NMD to regulate protein expression, and this regulation is sensitive to levels of EJC-associated factors [57]. We identified 620 zebrafish genes encoding APPRIS-annotated transcripts that contain uORFs where ribosome footprints can be detected [58,59]. When compared with a control set of intron-less protein-coding genes, a group of 612 detectably expressed genes containing ribosome-occupied uORFs show a significant positive fold change in EJC mutant embryos (Fig 5E and 5F). Additionally, APPRIS-supported uORF-containing transcripts are enriched within genes showing statistically significant fold change > 1 (log 2 FC > 0) in all EJC mutant and upf1 morphant datasets (S3I Fig). Overall, we conclude that all major modes to trigger EJC-dependent NMD (i.e. PTCs, 3 0 UIs and uORFs) are active during zebrafish development.

Some transcripts with stop codon-proximal 3 0 UTR introns are upregulated upon loss of EJC and Upf1 function
Surprisingly, we noticed that among the 14 transcripts that are upregulated in rbm8a and magoh mutant embryos, and upf1 morphants (Fig 5A), three (foxo3b, phlda3 and nupr1a) are encoded by genes that contain a 3 0 UI where the distance between the stop codon and the intron is less than 50 nts. For foxo3b and nupr1a, the human orthologs also contain a 3 0 UI < 50 nts downstream of the stop codon. This observation raises an intriguing possibility that some mRNAs with a proximal 3 0 UI (< 50 nts distance between intron and upstream stop codon; Fig 6A) may be regulated by EJC-dependent NMD. Using APPRIS transcript annotations available within Ensembl GRCz10 database, we identified 861 zebrafish genes that encode protein-coding transcripts with proximal 3 0 UIs; as noted above, 582 genes encode transcripts with distal 3 0 UIs (3 0 UIs � 50 nts downstream of the stop codon) ( Fig 6A and S4A Fig,  S3 Table). Interestingly, proximal 3 0 UI-containing genes encode proteins which are enriched for mRNA binding and mRNA splicing factor functions (S4B Fig), two functional groups that are well-recognized to be regulated by EJC-dependent NMD [19,60]. We find that, of all proximal 3 0 UI-containing genes detectable in our datasets, a small percentage (3.5-8%, 70/854 in 27 targets (gadd45aa) or were previously shown to be zebrafish Upf1 targets (gtpbp1l, atxn1b) [55]. Red dots: the value of each individual replicate. Error bars: standard error of means. Horizontal black dashed line: fold change = 1. Welch's t-test p-values are indicated by asterisks ( �� p-value < 0.05). E. Empirical CDF plot, as plotted in B, of fold changes in 21 hpf magoh mutant embryos for genes that contain 3 0 UTR introns (APPRIS 3 0 UTR intron, mauve), uORF (orange), defined in Ensembl as NMD-biotype (green) compared to intron-less genes (black). KS test p-value for differences in distribution of fold changes between intron-less genes and each of the particular groups is indicated on the bottom right. F. CDF plot as in E showing the fold changes in rbm8a mutant embryos at 27 hpf.
https://doi.org/10.1371/journal.pgen.1008830.g005 To further confirm that some proximal 3 0 UI-containing genes are indeed regulated by NMD, we focused on a subset (foxo3b, cdkn1ba, and phlda2) that is upregulated in upf1 morphants and at least one EJC mutant, and where the existence of a proximal 3 0 UI is conserved in several other vertebrates including humans. Importantly, these genes show no evidence of additional splicing events in their 3 0 UTR (S4E Fig). After treating embryos with the NMD inhibitor NMDI14 [61], we found that foxo3b, cdkn1ba, and phlda2 transcripts are 2-to-8 fold upregulated, just like eif4a2, a distal 3 0 UI-containing transcript, and atxn1b, a previously validated [55] NMD target ( Fig 6D). Thus, mRNAs encoded by a subset of genes with a 3 0 UI in a stop codon-proximal position appear to be regulated in an EJC-and Upf1-dependent fashion. Interestingly, we did not observe any correlation between the distance of the 3 0 UI from the stop codon and the degree of fold change observed in EJC mutant or upf1 morphants (

A subset of human and mouse proximal 3 0 UI-containing genes may also be regulated by NMD
We surveyed human and mouse genomes for prevalence and conservation of proximal 3 0 UIs. Like in zebrafish, APPRIS-annotated proximal 3 0 UI-containing genes outnumber distal 3 0 UIcontaining genes in human (S5A Fig and S3 Table, 1239 proximal 3 0 UI-containing genes, 489 distal 3 0 UI-containing genes) and mouse (S5B Fig and S3 Table, 921 proximal 3 0 UI-containing genes, 649 distal 3 0 UI-containing genes). GO terms of human and mouse genes encoding APPRIS-annotated proximal 3 0 UI-containing transcripts are also enriched for mRNA binding function (S5E and S5F Fig). A cross-comparison of zebrafish, mouse, and human proximal 3 0 UI-containing genes identified 167 genes where the proximal position of 3 0 UI is conserved in all three organisms suggesting that proximal 3 0 UIs could serve regulatory functions. These genes show a significant interaction network amongst themselves (Fig 7A, p-value = 0.02), and are enriched for genes encoding proteins with RNA recognition motifs and with roles in neural development and disease (Fig 7B).
To test if some human and mouse proximal 3 0 UI-containing genes are also regulated by NMD, we analyzed publicly available RNA-seq datasets of human and mouse cell lines depleted of key NMD factors. We find that a subset of transcripts encoded by proximal and between the stop codon and the 3 0 UI is equal to or greater than 50 nts. Such 3 0 UI are classified as distal. Bottom: Schematic illustrating genes with 3 0 UI where the distance between the stop codon and 3 0 UI is less than 50 nts. Such 3 0 UI are classified as proximal. The ribosome (brown), direction of translation (black arrow), stop codon ('UAA' in white), EJC (green), exon-exon junction (EEJ), coding region of mRNA (black) and 3 0 UI of mRNA (gray) are labeled in the top panel. B. A scatter plot showing gene-level fold change (FC) for transcripts with proximal 3 0 UI (dark blue: FC > 1.5 and light blue: FC < 1.5) and distal 3 0 UI (black: FC > 1.5 and gray: FC < 1.5) in magoh mutant embryos at 21 hpf compared to wild-type siblings. Dots encircled in red represent genes that also contain an upstream open reading frame (see methods). Genes labelled on the plot also contain a proximal 3 0 UI in mouse and human, and are upregulated in both rbm8a mutant datasets (Fig 6C and S4C Fig) [63]. Furthermore, transcript stability of proximal 3 0 UI-containing genes grouped based on increasing distance from the stop codon (20-50 nts, 30-50 nts, and 36-50 nts) progressively increases upon UPF1 knockdown in HEK293 cells [57] (Fig 7D). Notably, the proximal 3 0 UIcontaining genes where the intron is � 36 nts from the stop codon are the most significantly stabilized. To further test the UPF1 and EJC dependence of proximal 3 0 UI-containing NMD targets in human cells, we knocked down UPF1 or EIF4A3 in a human colorectal carcinoma cell line (HCT116), and assessed levels of a subset of 3 0 UI-containing transcripts. This subset consists of human orthologs of all three proximal 3 0 UI-containing genes validated in zebrafish (FOXO3 CDKN1B, and PHLDA2). We also chose three proximal 3 0 UI-containing genes that show the highest change in stability upon UPF1 knockdown in Fig 7D (STX3, ULBP1 and  RBM3). We also included HNRNPD, an RNA-binding protein-encoding transcript whose principal APPRIS isoform contains a proximal 3 0 UI. In these transcripts, the proximal position of the 3 0 UI is conserved (with the exception of STX3;. Importantly, primer pairs used for detection of these transcripts unambiguously amplify only the proximal 3 0 UI-containing isoforms (S4 Table). We find that, similar to transcripts encoded by distal 3 0 UI-containing genes (ARC and SRSF4), CDKN1B and ULBP1 are significantly upregulated upon EIF4A3 and UPF1 knockdown (Fig 7E). STX3 and FOXO3 are significantly upregulated either upon UPF1 or EIF4A3 knockdown but not under both conditions. The proximal 3 0 UI-containing isoform of HNRNPD, on the other hand, remains unchanged in HCT116 cells upon UPF1 or EIF4A3 knockdown (Fig 7E). PHLDA2 and RBM3 were below detection limits in HCT116 cells and therefore could not be tested. As further validation of the specificity of the proximal 3 0 UIs to induce NMD, we find that UPF1 knockdown in HEK293 cells [57] specifically increases levels of the proximal 3 0 UI-containing isoforms as compared to 3 0 UI-lacking isoforms produced from the same gene (Fig 7F), similar to the effect observed for transcript isoforms with or without distal 3 0 UI ( Fig 7G). Further, proximal 3 0 UI-containing isoforms of ALG8 and SLC30A7, two transcripts that show the highest fold-upregulation in Fig 7F, are also specifically upregulated as compared to their 3 0 UI-lacking isoforms upon UPF1 depletion in HCT116 cells ( Fig  7H). While these data collectively suggest that the presence of a proximal 3 0 UI may sensitize certain transcripts for NMD, it remains to be tested if a proximal 3 0 UI is sufficient to induce NMD or if it acts in concert with/via other NMD signals (see Discussion).

Loss of function of foxo3b, a proximal 3 0 UI-containing gene upregulated in EJC mutant embryos, partially rescues motor axon outgrowth
The zebrafish foxo3b gene encodes a transcript that is significantly upregulated in both EJC mutants and upf1 morphants (Fig 6B and 6C, S4C and S4D Fig and S6A Fig). These data, as well as the conservation of proximal intron position in other vertebrate foxo3b homologs ( Fig  8A) and the NMD susceptibility of human FOXO3 (Fig 7E), suggests that foxo3b expression is regulated by EJC-dependent NMD. Like foxo3b transcript, we find that Foxo3b protein is upregulated in 21 hpf magoh mutant embryos (2.8-fold; Fig 8B) and in 27 hpf rbm8a mutant embryos (1.3-fold; S6B Fig) compared to wild-type siblings. Furthermore, five known Foxo3b transcriptional target genes [64] are also upregulated in magoh and rbm8a mutant embryos (S6C Fig). To test if Foxo3b upregulation contributes to EJC mutant phenotypes, we obtained a previously described foxo3b null allele, foxo3b ihb404 [65,66], generated magoh; foxo3b and rbm8a; foxo3b doubly heterozygous adults, and examined muscle and motor neuron development in single, double, and compound mutant embryos. As expected, embryo morphology, motility and motor axon outgrowth of foxo3b mutant embryos is indistinguishable from wild-  [57]. Dots encircled in red are transcripts that also contain an uORF as determined previously in [57]. Same analysis for Ensembl transcript annotations is shown in Fig S5G. D. Empirical CDF plot showing change in mRNA stability for different classes of NMD targets and intron-less genes upon UPF1 knockdown in HEK293 cells (data from [57]). The gene classes are as follows: proximal 3 0 UI-containing genes where distance is 20-50 nts (light green), 30-50 nts (olive green) and 36-50 nts (dark blue), Ensembl-annotated NMD-biotype genes (red) and intron-less genes (black). CDF function is plotted as in Fig 5B. KS test p-value for comparison of NMD targets to intron-less genes is indicated in the same color. E. qRT-PCR analysis showing fold changes for proximal 3 0 UI-containing genes CDKN1B, FOXO3 (upregulated in zebrafish); ULBP1; STX3 (highest change in stability upon UPF1 KD in HEK293 cells [57]); HNRNPD (encodes RRM-containing protein) and distal 3 0 UI-containing genes (ARC and SRSF4) upon UPF1 (left) and EIF4A3 (right) knockdown in HCT116 cells. The distance between stop codon and 3 0 UI for every 3 0 UI-containing gene is indicated below each bar. TBP is the normalizing gene used for qRT-PCR analysis. Welch's t-test p-values are indicated using asterisks ( �� p-value < 0.05 and � p-value < 0.1). F. Empirical CDF plot of fold changes in levels of 3 0 UI-containing isoforms (dark blue) as compared to 3 0 UI-lacking isoforms (sky blue) encoded from same genes in UPF1-depleted HEK293 cells. KS test p-value for differences in the two distributions is indicated on the bottom right. G. CDF plot as in F showing the fold changes in levels of distal 3 0 UI-containing (black) versus 3 0 UI-lacking isoforms (gray). H. qRT-PCR analysis showing fold changes for 3 0 UI-containing and 3 0 UI-lacking isoforms encoded by proximal 3 0 UI-containing genes ALG8 and SLC30A7 and by a distal 3 0 UI-containing gene SRSF4 upon UPF1 knockdown in HCT116 cells. The distance of 3 0 UI when present, or its absence (-) is indicated below each bar. TBP is the normalizing gene used for qRT-PCR analysis. Welch's t-test p-values for the three biological replicates of each transcript are < 0.05. The Welch's paired t-test p-values for the comparison between fold changes of 3 0 UI-containing and 3 0 UI-lacking isoforms is indicated using red asterisks ( �� p-value < 0.05).
https://doi.org/10.1371/journal.pgen.1008830.g007 type embryos; hence these mutant embryos are sorted into the wild-type sibling pool. In contrast, as noted above, motor axons barely extend beyond the horizontal myoseptum in EJC mutant embryos (Fig 3E-3G, Fig 8G and 8H and S6H and S6I Fig). Strikingly, we find that heterozygous and homozygous loss of foxo3b in EJC mutant embryos leads to significantly longer motor axons that extend well beyond the horizontal myoseptum (Fig 8I-8K and S6J-S6L Fig), but not as far as in wild-type sibling embryos. Despite significant rescue of motor axon outgrowth, neuromuscular junction formation (Fig 8I and 8J and S6J and S6K Fig) and myofiber organization (Fig 8E and 8F and S6F and S6G Fig) are not restored in magoh; foxo3b or rbm8a; foxo3b double mutant embryos. Thus, we conclude that Foxo3b repression via EJCdependent NMD is important for motor axon outgrowth. Additionally, we predict that regulation of other mRNA targets is required for proper muscle development.

Loss of EJC causes tissue-specific defects and embryonic lethality in zebrafish
In zebrafish rbm8a and magoh single mutant embryos, both Rbm8a and Magoh proteins are co-depleted (Fig 2C and Fig 2D). This simultaneous reduction of rbm8a and magoh function likely impairs EJC function leading to rapid emergence of developmental defects, which progressively worsen and lead to embryonic death by 2 days post-fertilization. Remarkably, the defects in EJC mutants initially arise in specific tissues. Head necrosis is morphologically apparent in EJC mutants by 19-21 hpf, with onset occurring earlier in magoh mutants than in rbm8a mutants. This neural cell death phenotype is readily detected in both mutants at 19 hpf with the more sensitive acridine orange dye (S2A Fig). The neural cell death phenotype is similar to that seen in mouse heterozygous EJC mutant embryos [30,31], and is consistent with the microcephaly phenotype in human patients heterozygous for hypomorphic RBM8A and EIF4A3 mutations [28,29]. Thus, across vertebrates, certain tissues appear more sensitive to loss of EJC function. The emergence of defects in EJC mutant embryos in discrete lineages such as neural and muscle cells may result from tissue-specific differences in EJC protein functions, activity of EJC regulators, or decay rates of maternally-provided EJC transcript/protein. Future investigation into these possibilities in zebrafish embryos may explain why loss of a ubiquitously-expressed entity like the EJC leads to tissue-specific phenotypes, as are also observed in human EJC-linked syndromes [28,29]. Notably, unlike haploinsufficiency of EJC core components in mouse and human [28][29][30][31], heterozygous loss of rbm8a or magoh in zebrafish does not have any apparent phenotypic consequences, indicating that the threshold dose of EJC may differ between zebrafish and mammals.
Despite similar phenotypic defects in rbm8a and magoh mutant embryos, gene expression changes in the two mutants show only a modest (albeit statistically significant) overlap (Fig 4). Multiple factors are likely to contribute to this observation. Foremost, it is likely that at 21 hpf, when gene expression is first compared between rbm8a and magoh mutants (Fig 4D), the two mutants are at different stages of developing EJC-related defects even though they are at the same developmental time point. In support of this idea, magoh mutants show head necrosis at 19 hpf while this phenotype is not readily seen in rbm8a mutants until 21 hpf. This temporal difference in appearance of defects in the two mutants may be driven by the timing/rate of depletion of maternal mRNA/protein stores (Fig 2C and 2D). When comparing RNA-seq data sets from rbm8a mutants at 21 hpf and 27 hpf (Fig 4E), the six-hour age difference is likely a major factor in the observed gene expression differences. The comparison of 27 hpf rbm8a to 21 hpf magoh mutant gene expression signatures (Fig 5A) is thus likely to compound the two factors described above resulting in smaller overlapping changes. It is also noteworthy that developing zebrafish embryos of the same genotype from the same clutch can show a 2-fold or more change in~12% of genes [67], and embryos of the same genotype derived from different mothers show distinct mother-specific transcriptome signatures [67]. Such issues can further increase variation in gene expression estimates within and across biological replicates of the same genotype, amplifying differences among the two EJC mutants. Finally, some differences in gene expression profiles of rbm8a and magoh mutants could arise due to their EJC-independent functions (e.g. human RBM8A can bind to mRNA cap structure and may regulate decapping, [68,69]). In the future, the shared impact of the EJC core proteins in developing embryos can be better evaluated by comparing gene expression in specific cell types or tissues of zebrafish rbm8a and magoh mutants at the same developmental stage.

The EJC is a critical component of NMD in zebrafish
Our finding that a significant fraction of genes upregulated in EJC mutant embryos are also upregulated in upf1 knockdown embryos (Fig 5) suggests that EJC-dependent NMD is compromised in both rbm8a and magoh mutant embryos. The overlap between upregulated genes in EJC mutant and upf1 morphants is small (Fig 5A and S3A Fig), likely due to differences in developmental timing or due to NMD-independent functions of Upf1 and EJC proteins. However, the genes within these overlaps include previously validated zebrafish NMD targets and orthologs of known NMD targets in mammals (Fig 5D and S2 Table). Furthermore, several known classes of NMD targets such as PTC-, uORF-, and 3 0 UI-containing transcripts are significantly upregulated in rbm8a and magoh mutant embryos (Fig 5E and 5F). A moderate enrichment of these classes of transcripts among the upregulated genes in EJC mutants (S3H and S3I Fig) suggests that many of the genes with these features are directly regulated by EJCdependent NMD. Therefore, the EJC is important for the quality control function (i.e. suppression of aberrant PTC-containing transcripts) and the gene regulatory activity of the zebrafish NMD pathway, which further underscores the importance of EJC-dependent NMD for developmental and tissue-specific gene regulation [20,21,24,25]. An important future goal will be to expand on how EJC-dependent NMD regulates specific genes in particular cell types and tissues to control development.

Proximal 3 0 UTR introns as a probable NMD-inducing feature
Several of our observations raise a possibility that proximal 3 0 UIs can induce NMD. Nearly 10% of all detectable zebrafish proximal 3 0 UI-containing genes, like distal 3 0 UI-containing genes, are � 1.5 fold upregulated in EJC mutant and upf1 morphant datasets (Fig 6B and 6C,  S4D Fig), and a subset of these are upregulated in zebrafish embryos treated with the NMD inhibitor NMDI14 (Fig 6D). Further, a subset of proximal 3 0 UI-containing genes is also upregulated in mouse and human NMD-and EJC-compromised cells (Fig 7 and S5 Fig). Importantly, NMD inhibition specifically upregulates transcript isoforms that contain a proximal (or distal) 3 0 UI but not the 3 0 UI-lacking isoforms produced from the same genes (Fig 7F-7H). These observations are also consistent with previous reports of NMD susceptibility of a T cell receptor-β (TCR-β) reporter RNA where stop codon is only 10 nt upstream of the last exonexon junction [70], and of triose phosphate isomerase reporter RNA with stop codon 40 nt upstream of the last exon-exon junction [71].
How can a proximal 3 0 UI lead to NMD? The prevalent 50-nt rule is presumed to account for the minimum distance required to accommodate a terminated ribosome at stop codon so that it does not interfere with the downstream EJC. Based on estimates that ribosome footprints at stop codons extend about 9 nts into the 3 0 UTR [e.g. see 58] and that the EJC 5 0 boundary lies about 27 nts upstream of the exon-exon junction (Fig 1E), EJC deposited by a 3 0 UI located at least 36 nts downstream of a stop codon may not always be displaced by the terminating ribosome and could induce NMD via the currently accepted mechanism (Fig 9A). How introns within the first 35 nts of a 3 0 UTR might induce NMD is more perplexing. One possibility is that introns within 35 nts of the stop may trigger NMD via non-canonical EJCs present downstream in the 3 0 UTR [42,43] (Fig 9A). Additionally, EJC-interacting factors such as SR proteins deposited on 3 0 UTR sequences after 3 0 UI splicing may also recruit NMD-activating factors [42,43,72,73] (Fig 9A). Our data do not exclude additional possibilities for proximal 3 0 UIs to activate NMD in concert with other mechanisms that are intron-dependent (e.g. EJCdependent translation enhancement or yet-unknown distal 3 0 UIs) or intron-independent (e.g. 3 0 UTR length or NMD-promoting sequences) (Fig 9A). Interestingly, in T cells in mice, PTCcontaining TCR-β shows very strong intron-dependent NMD but is also downregulated >3-fold by an intron-independent mechanism [74]. Therefore, further studies are necessary to rigorously test if proximal 3 0 UIs, like distal 3 0 UIs, can act as an independent NMD-inducing feature.
We also observe that proximal 3 0 UI-containing genes are variably susceptible to reduced EJC/NMD function. For example, zebrafish foxo3b is significantly upregulated in EJC mutant and upf1 morphants whereas human FOXO3 is only mildly sensitive to reduced UPF1 levels in cell lines (Fig 6, S4 Fig, Fig 7 and S5 Fig) but shows robust upregulation upon EIF4A3 knockdown in HCT116 cells (Fig 7E). Furthermore, many vertebrate genes with proximal 3 0 UIs are not upregulated, or are even downregulated, upon diminished EJC/UPF1 function (Fig 6 and  Fig 7). These observations, and previous findings of Wittkopp et al (2009) that a PTC introduced 6 nt upstream of the last exon-exon junction of a reporter RNA does not elicit NMD in zebrafish cultured cells [54], show that many proximal 3 0 UI-containing transcripts may not be subject to NMD at all. Interestingly, similar to proximal 3 0 UI-containing genes, distal 3 0 UIcontaining genes also show a variable susceptibility to EJC/NMD-deficiency (Fig 6B and 6C,  Fig 7C). The variability and/or non-responsiveness of some 3 0 UI-containing genes to EJC/ NMD manipulations could be due to their low expression at developmental stage/cell type investigated or due to their variable sensitivity to EJC/NMD protein levels. Further, downregulation of proximal 3 0 UI-containing genes could result from indirect effects of compromised EJC/NMD function. It is also possible that some 3 0 UI-containing genes actively evade NMD via 3 0 UTR-bound proteins [75]. Thus, regulation of mRNA stability by 3 0 UIs is likely to be a net outcome of a combinatorial control of NMD by multiple determinants of 3 0 UTR RNP composition, a model that requires further validation on gene-by-gene basis.

EJC-dependent NMD of foxo3b is critical for zebrafish motor axon outgrowth
Certain genes maintain proximal 3 0 UIs across vertebrate evolution (Fig 7 and Fig 8) despite much faster rates of intron loss from 3 0 UTRs compared to coding region [76,77], suggesting that proximal 3 0 UIs may play an important role in gene regulation and cellular function. foxo3b is one such example that is regulated via EJC-dependent NMD in zebrafish embryos (Fig 6) and cultured human cells (Fig 7), and such control is critical for motor axon outgrowth (Fig 8 and Fig 9B). Therefore, of the hundreds of NMD targets identified in multiple cell types and organisms, foxo3b is among the handful of genes such as Robo3.2 in mouse commissural axons [20] and Smad7 in cultured embryonic stem cells [62], whose regulation by NMD impacts cell fate and differentiation in vivo. foxo3b encodes a forkhead box transcription factor that acts as a hub for integration of several stress stimuli, and functions in processes such as cell cycle, apoptosis, and autophagy [78,79]. In zebrafish, Foxo3b has been implicated in survival under hypoxic stress [66], inhibition of antiviral responses [65], and canonical wnt signaling inhibition [80]. FOXO3, the mammalian ortholog of Foxo3b, physically interacts with p53, and both act synergistically to induce apoptosis in response to stress [81]. In addition, several pro-apoptotic genes (e.g. bim, bbc3, gadd45a) are direct FOXO3 transcriptional targets (S6C Fig) [82,83]. Thus, the regulation of foxo3b by EJC-dependent NMD (Fig 6 and  Fig 8) can directly impact cell survival. The elevated levels of Foxo3b in EJC mutant embryos may cause motor axon growth defects due to reduced Wnt signaling [80,84], increased neural cell death, and/or other cell-autonomous or non-cell-autonomous reasons. Following the loss of foxo3b function in EJC mutant embryos, the partial reversal of motor axon length (Fig 8) parallels the rescue of neural apoptosis in mouse EJC mutant embryos upon brain-specific p53 ablation [30,31], and the reversal of cell death in NMD-defective flies and human cell lines upon reduced activity of GADD45A [85]. Therefore, foxo3b is a critical NMD target in zebrafish motor neurons, and may serve as an example for how 3 0 UIs can modulate 3 0 UTR RNP composition, mRNA stability and protein production during development (Fig 9).

Ethics statement
Animal experiments were performed in accordance with institutional and national guidelines and regulations and were approved by the Ohio State University Animal Care and Use Committee (Protocol number: 2012A00000113-R2).

Animal stocks, lines, and husbandry
Adult zebrafish (Danio rerio) were housed at 28.5˚C on a 14 hour light/10 hour dark cycle and embryos were obtained by natural spawning or in vitro fertilization. Embryos were raised at both 25˚C and 28.5˚C and were staged according to Kimmel et al. (1995) [87]. rbm8a oz36 and magoh oz37 lines generated using CRISPR/Cas9 mutagenesis (described below) in the AB strain. The foxo3b ihb404 line [65] was obtained from the Xiao lab at the Chinese Academy of Sciences, Wuhan, China.
rbm8a gRNA target site: (5 0 -GGGAGGCGAAGACTTTCCTA-3 0 ) magoh gRNA target site: (5 0 -GGTACTATGTGGGGCATAA-3 0 ) Injected embryos were raised to 24 hpf at which time embryos were individually screened by high-resolution melting analysis (HRMA) to assess target site mutation efficiency in somatic cells. Remaining embryos were raised and crossed to AB wild-type adults; F1 adults were screened for germline transmission of CRISPR-induced mutations using HRMA. HRMA revealed unique rbm8a and magoh mutant alleles transmitted by multiple F0 founders. We recovered the rbm8a oz36 and magoh oz37 alleles and outcrossed the heterozygotes to the AB wild-type strain for two generations before intercrossing for phenotypic analyses. Sequences of primers used for genotyping are listed in S4 Table. EJC mutant and foxo3b ihb404 mutant embryo and adult genotyping strategy Individual embryos and adult fin tissue were lysed in 50 μl 1M NaOH for 15 mins at 95˚C followed by incubation on ice for 5 minutes at 4˚C, and then neutralized with 5 μl of 1M Tris-HCl pH 8. For genotyping fixed embryos, heads were removed into ThermoPol buffer (20 μl) and treated with 2 mg/ml ProK at 55˚C for 3 hours to extract DNA. 1 μl of DNA extract was used as a template in a 20 μl PCR with Taq polymerase according to the manufacturer's protocol (NEB). For genotyping rbm8a oz36 and foxo3b ihb404 mutant embryos, PCR products were digested with 20 units of XmnI and XcmI respectively (NEB) to distinguish cleavable mutant from un-cleavable wild-type amplicons. Digested products were analyzed on a 1% agarose gel stained with Gel Red (Biotium). For genotyping magoh oz37 mutant embryos, PCR products were analyzed by separation of mutant and wild-type alleles on a 2% agarose gel stained with Gel Red (Biotium). Primer sequences are listed in S4 Table.

Microscopy and Imaging
Immuno-stained embryos were dissected and mounted in Fluoromount-G (SouthernBiotech) and imaged at 40X magnification using MetaMorph software (Molecular Devices) on an Andor SpinningDisc Confocal Microscope (Oxford Instruments) with Nikon Neo camera. Live images of EJC mutant and wild-type sibling embryos were taken by mounting embryos in 3% methylcellulose and imaging with a AxioCam camera on a Zeiss upright AxioPlan2 microscope.

Zebrafish NMDI14 inhibitor treatment
NMDI14 (Sigma) stock solution was made in DMSO as per manufacturer's instructions. AB wild-type zebrafish embryos were dechorionated on agarose-coated 10 cm plates. At 3 hpf, NMDI14 was added to a final concentration of 4.8 μM. At 24 hpf, embryos (20/treatment) were rinsed with fresh fish water and added to 500 μl of Trizol (Thermo Fisher Scientific) for RNA preparation.

Immunoblot analysis
SDS-PAGE gels and western blots were performed using the standard mini-PROTEAN tetra system (Bio-Rad). All western blots were stained using infrared fluorophore-conjugated secondary antibodies and were scanned on a LI-COR Odyssey CLx imager. Protein quantification was performed using Image Studio software (v5.2.5).

Quantification of paralysis and motor axon length
At 24 hpf, EJC mutant and wild-type sibling embryo movements were scored under the dissecting microscope by counting the number of tail contractions per minute. For motor neuron axon quantification, immunofluorescence images of 26 hpf EJC mutant and wild-type sibling embryos were stained with anti-SV2 as described above. Images were imported into Fiji (Ima-geJ v2) and motor axon length was quantified using the Simple Neurite Tracer plugin [91].

RNA-Immunoprecipitation-Seq and RNA-Seq sample collection
At 24 hpf, zebrafish embryos (n = 800 embryos/IP) were triturated using a 200 μl pipette and washed to remove yolks as previously described [92], followed by flash freezing the tissue in liquid nitrogen. Whole embryo tissue was lysed and sonicated in 800 μl of hypotonic lysis buffer (HLB) [20 mM Tris-HCl pH 7.5, 15 mM NaCl, 10 mM EDTA, 0.5% NP-40, 0.1% Triton X-100, 1 mM Aprotinin, 1 mM Leupeptin, 1 mM Pepstatin, 1 mM PMSF]. Lysates were sonicated using a microtip for 7 seconds, NaCl was increased to 150 mM, and RNase I was added to 100 μg/ml. Following a 5-minute incubation on ice, cell lysates were cleared by centrifugation at 15,000 × g. The sample was split into 2 tubes (400 ul each) and the volume was increased to 2 mL by addition of isotonic lysis buffer. Complexes were captured on Protein G Dynabeads (Thermo Fisher) conjugated to IgG or α-Rbm8a for 2 hours at 4˚C. Complexes were washed in isotonic wash buffer (IsoWB) [20 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.1% NP-40] and eluted in clear sample buffer [100 mM Tris-HCl pH 6.8, 4% SDS, 10 mM EDTA, 100 mM DTT]. The proteins were eluted in 20 μl of clear sample buffer [100 mM Tris-Hcl pH 6.8, 4% SDS, 10mM EDTA, 100 mM DTT] and 10 μl of the sample was used to separate the proteins via SDS-PAGE and analyze by western blotting. The remaining 10 μl of the sample was used for RNA extraction using Phenol-Chloroform-Isoamyl alcohol precipitation. RNA was resuspended in 10 μl of RNase-free water. 1 μl of the RNA was end-labeled with γ-32 P-ATP and then run on a denaturing 20% urea PAGE gel to assess quality while the remainder was used for RNA-Seq library preparation.
For RNA-Seq sample collection, EJC mutant and wild-type sibling embryos (N = 25) were harvested at 21 and/or 27 hpf and lysed in 500 μl Trizol (Thermo Fisher Scientific). Heterozygote and wild-type embryos (wild type sibling embryos) were mixed together for the control RNA-Seq dataset because heterozygotes are morphologically indistinguishable from wild-type embryos, they can only be distinguished by the DNA genotyping strategies described above. RNA was extracted following manufacturer standard procedures.

RIP-Seq and RNA-Seq library preparation
For RIP-Seq, RNA extracted from~90% of RIP eluate was used to generate strand-specific libraries. For RNA-Seq libraries, 5 μg of total cellular RNA was depleted of ribosomal RNA (RiboZero kit, Illumina), and subjected to base hydrolysis. RNA fragments were then used to generate strand-specific libraries using a custom library preparation method [93]. Briefly, a pre-adenylated miR-Cat33 DNA adapter was ligated to RNA 3 0 -ends and used as a primer binding site for reverse-transcription (RT) using a special RT primer. This RT primer contains two sequences linked via a flexible PEG spacer. The DNA with a free 3 0 -end contains sequence complementary to a DNA adapter as well as Illumina PE 2.0 primers. The DNA with a free 5 0end contains Illumina PE 1.0 primer sequences followed by a random pentamer, a 5 nt barcode sequence, and ends in GG at the 5 0 -end. Following RT, the extended RT primer was gel purified, circularized using CircLigase (Illumina), and used for PCR amplification using Illumina PE 1.0 and PE 2.0 primers. All DNA libraries were quantified using an Agilent Bioanalyzer to determine DNA length and a Qubit Fluorometer to quantify DNA amount. Libraries were sequenced on an Illumina HiSeq 2500 platform in the single-end format (50 nt read lengths). For each RNA-Seq experiment consisting of an EJC mutant and its WT sibling at a given timepoint, three biological replicates were sequenced per genotype.

Zebrafish EJC mutant embryo RIP-seq and RNA-Seq data analysis
Adapter trimming and PCR duplicate removal. After demultiplexing, fastq files containing unmapped reads were first trimmed using Cutadapt (v2.3). A 12 nt sequence on read 5 0end consisting of a 5 nt random barcode sequence, 5 nt identifying barcode, and a CC was removed. The random barcode sequence associated with each read was saved for identifying PCR duplicates down the line. Next, as much of the 3 0 -adapter (miR-Cat22) sequence TGGAATTCTCGGGTGCCAAGG was removed from the 3 0 -end as possible. Any reads less than 20 nts in length after trimming were discarded.
Alignment and removal of multi-mapping reads. For RIP-Seq, following adapter trimming, reads were aligned with HISAT2 v2.1.0 (Kim et al., 2015) using 24 threads to zebrafish GRCz10. After alignment, reads with a HISAT2 mapping score less than 60 were removed, i.e. all multi-mapped reads were discarded. Finally, all reads mapping to identical regions were compared for their random barcode sequence; if the random sequences matched, such reads were inferred as PCR duplicates and only one such read was kept.
RIP-Seq data downstream analyses. First, by comparing aligned reads to a GRCz10 exon annotation obtained through Ensembl BioMart we determined the 5 0 and 3 0 end distribution of RIP-Seq reads and the meta-exon distributions of RIP-Seq reads. The primary reference transcriptome was obtained from Ensembl BioMart. For each gene, the principal transcript isoform (transcript with an APPRIS P1 annotation) was selected for all analyses concerning the specificity of the RIP-Seq replicates. These analyses include calculation of RPKMs for the major APPRIS P1 isoform and comparison of intronic RPKMs to exonic RPKMs as well as intron-less transcript RPKMs to multi-exon transcript RPKMs. RNA-Seq differential expression analysis. Differential expression analysis using EJC mutant embryos and wild-type RNA-Seq data was conducted based on the RNA-Seq workflow published by Love et al. 2018 [95].
First, to create count-matrices for each RNA-Seq experiment, the GenomicAlignments and SummarizedExperiment software [96,97] were used to count reads mapping per gene for each RNA-Seq bio-replicate. The count matrix was filtered to remove all genes with zero counts in all samples before differential expression (DE) analysis using DESeq2 [98]. At least one, if not all of the biological replicates for each RNA-Seq experiment were sequenced during a separate deepsequencing run. These differences in sequencing runs introduced some variability among the replicates. To account for the variability among our RNA-Seq bio-replicates during differential expression analysis we used the RUV-seq R package [99]. Usual methods of normalization only account for sequencing depth but RUV-seq methods can be used to normalize libraries for library preparation and other technical effects. We used the RUVs method of the RUV-seq package which utilizes the centered counts (the counts of genes unaffected by our covariates of interest such as the sample genotype) to determine a normalization factor for each library. The count matrix was imported into DESeq2, and the RUVs normalization factors and genotype were used in the design formula to construct the DESeq dataset for gene-level differential expression analysis. We used the LRT test with all default DESeq2 settings to identify genes differentially expressed between mutant and wild-type samples. To correct for multiple testing in the DE analysis we used Benjamini-Hochberg (BH) adjustment with independentFiltering set to false. We decided to set independentFiltering to false because we are interested in studying NMD targets which are most likely to have low read counts in wild-type embryos. In the case of the rbm8a -/-21 hpf and 27 hpf datasets the histogram of all p-values showed a hill-shaped distribution. In order to account for this distribution, as per the suggestion made in RNA-Seq workflow [95], we used fdrtool [100] for multiple testing and determined adjusted p-values using default fdrtool settings.

Gene Ontology enrichment analysis
The PANTHER14.0 [86] tool was used to identify significantly enriched biological process GO terms in genes that are found to be significantly differentially expressed in EJC mutant embryos by DESeq2. The PANTHER tool was also used to identify significantly enriched biological process GO terms in proximal 3 0 UI genes in zebrafish and humans. For all analyses the PANTHER's Benjamini-Hochberg correction was used to calculate adjusted p-values.

Overlap analysis
The universal and test sets were chosen to be the set of Ensembl gene IDs which were assigned an adjusted p-value post-DESeq2 analysis. After determining a universal dataset for each RNA-seq dataset, the smallest universal set for the comparison in question was chosen. The significance of overlap was calculated using a hypergeometric test using the R statistical software.

STRING network analysis
The STRING database [101] was used to identify connections between proteins encoded by proximal 3 0 UI genes with default high confidence settings (minimum required interaction score = 0.7). The clusters shown were created after the Markov Cluster Algorithm (MCL) inflation parameter was set to 3 clusters.

Human and mouse RNA-Seq data analysis
SRA files were downloaded from sources specified in [62,63]. Fastq files generated from the SRA files were mapped using TopHat2 (version 2.1.1) using the same settings described above for zebrafish alignment. Count matrices were generated using the GenomicAlignments and SummarizedExperiment packages. The count matrices were imported into DESeq2 for differential expression analysis using the LRT test, BH adjustment and with independentFiltering set to false.

Identification of uORF genes in zebrafish, human and mouse
We selected for uORFs which were categorized as "functional uORFs" in Johnstone et al. 2016 [59] based on RNA-Seq and ribosome profiling. The following filters were applied to select for uORFs that show evidence of translation at 24 hpf: 5 0 UTR RPF RPKM � 5, RNA-Seq RPKM� 5, RPF RPKM � 5, ORF translation efficiency at 24 hpf >1.

Identification of 3 0 UTR intron containing genes in zebrafish, mouse, and human
A table describing exon starts, exon ends, CDS start, CDS end, strand and APPRIS annotation was downloaded from the Ensembl database for all transcripts in zebrafish (GRCz10), human (GRCh38) and mouse (GRCm38). All transcripts with any level of APPRIS annotation [56] were included. We then identified transcripts that contain introns in the 3 0 UTRs by subtracting exon start coordinates from the CDS end coordinates in a strand specific manner. We then determined the distance of the farthest 3 0 UTR intron to the stop codon; based on the distance (< or � 50 nts) as well as the number of 3 0 UTR introns the transcripts were classified into proximal and distal categories. Proximal transcripts were defined by the presence of only one 3 0 UTR intron which is within 50 nts of the normal stop codon. Distal transcripts were defined by the presence of one 3 0 UTR intron which is more than 50 nts away from the stop codon or by the presence of more than one 3 0 UTR intron irrespective of the distance of the nearest 3 0 UTR intron to the stop (i.e.If a gene encodes transcript with both proximal and distal 3 0 UIs, it is classified in the latter group.). For all fold-change analyses, we defined distal/proximal 3 0 UI-containing genes as those that encode one or more distal/proximal 3 0 UI+ transcripts. The distal 3 0 UI-containing genes that did not have an APPRIS annotation but were annotated with an 'NMD biotype' in the Ensembl database were included as a separate group in the analyses included in Fig 5 and the group was named as NMD biotype.

Analysis of NMD-sensitive and NMD-insensitive isoforms of 3UIcontaining genes
The transcript level quantification data were obtained from S4 Table of Baird et al. (57). After removing transcripts that lack q-values in UPF1 knockdown compared to the control, transcripts were separated based on the presence or absence of a PTC as indicated in the table. Among the PTC-containing transcripts, those that contain a 3 0 UI and are the only detectable 3 0 UI-containing transcript from the corresponding gene were selected as distal 3 0 UI-containing isoforms based on our classification described above. Among the PTC-lacking transcripts, those that contain a proximal 3 0 UI and are the only detectable 3 0 UI-containing transcript of the corresponding gene were selected as proximal 3 0 UI-containing isoforms based on our classification. Finally, we selected the most abundant (highest TPM value in the control dataset) 3 0 UI-lacking isoform of the selected proximal and distal 3 0 UI-containing genes as control groups for analysis in Fig 7F and 7G.

Mammalian cell culture knockdown experiments
HCT116 cells were seeded into 12 well plates (10 5 cells/ well) in McCoy's 5A media and 15 pmol siRNA was reverse transfected using 1.6 μl lipofectamine RNAiMAX reagent (Thermo Fisher Scientific) per well. Knockdown was carried out for 48 hours with a media change after 24 hours. Cells were harvested in hypotonic lysis buffer (described above), 30% of the cell lysate was saved to check the efficiency of knockdown (S5J Fig) while

Zebrafish upf1 knockdown experiment and RNA-Seq
For conducting upf1 knockdown in zebrafish, 2 ng of a splice blocking morpholino (MO) diluted in 0.2 M KCl with 0.1% phenol red was injected into 1-cell stage embryos. The upf1 MO used was previously published and named upf1 MO2 [54] with a sequence of 5 0 -TTTTGGGAGTTTATACCTGGTTGTC-3'. Morpholino was synthesized by Gene Tools, LLC. Uninjected wild-type control embryos (n = 30) and injected morphants (n = 30) were raised for 12 hours at 28.5˚C and lysed in 500 μl Trizol following manufacturer's procedures (Thermo Fisher Scientific). After RNA-extraction, double-stranded cDNA was synthesized following Illumina's TruSeq protocol per manufacturer's instructions. Briefly, mRNA was purified from 1 μg total RNA using Dynabeads oligo(dT) 25 magnetic beads (Thermo Fisher Scientific) followed by clean-up with AMPure XP SPRI beads (Beckman Coulter). mRNA was then fragmented for 5 minutes at 70˚C using Ambion's RNA Fragmentation Reagent (AM8740) followed by an additional clean-up step using AMPure XP SPRI beads. Fragmented RNA was reverse transcribed using random primers and SuperScript III (Thermo Fisher Scientific). After second strand synthesis, end repair, and 3' end adenylation per the TruSeq protocol (Illumina, Inc.), libraries were constructed using an Apollo 324 automated library system. After Illumina adapter ligation and amplification, all DNA libraries were quantified using an Agilent Bioanalyzer instrument to determine DNA length and a Qubit Fluorometer to quantify DNA amount. 10 cycles of amplification was performed prior to sequencing on an Illumina HiSeq 2000 system in the paired-end format (100 nt read lengths). All experiments were performed in biological duplicate.

RNA-seq data analysis.
Trimmed reads were obtained from the sequencing core and then mapped to the GRCz10 genome assembly using TopHat2 as described above. Differential gene expression analysis was also performed as described above using DESeq2. RUV-Seq and fdrtool corrections were not required.

Quantitative RT-PCR
Zebrafish embryos and mammalian cells were harvested in Trizol. RNA was isolated using standard Trizol procedures, followed by DNase treatment, purification with Phenol:Chloroform: Isoamyl alcohol (25:24:1, pH 4.5) and resuspension in RNase-free water. 1.5 μg of RNA was reverse transcribed using oligo-dT and Superscript III (Invitrogen). After reverse transcription of RNA, the samples were treated with RNase H (Promega) for 30 min at 37˚C. For each qRT-PCR 30 ng of cDNA was mixed with 5 μl of 2X SYBR Green Master Mix (ABS), 0.2 μl of a 10 mM forward and reverse primer each (defrosted once) in a 10 μl reaction. The qRT-PCRs were performed in triplicate (technical) using primers described in S4 Table. Reference genes for relative quantification were mob4 in zebrafish [102] and TATA-binding protein (TBP) in human cells. Fold-change calculations were performed by the ΔΔC t method. Foldchanges from three biological replicates were used to determine the standard error of means. The p-values were calculated using Welch t-test in the R statistical computing software.

Quantification and statistical analysis
All western blots were performed using infrared fluorophore conjugated secondary antibodies and were scanned on a LI-COR Odyssey CLx imager. Protein quantification was performed using Image Studio software (v5.2.5). Northern blot autoradiograms were scanned using Fuji FLA imager and quantified using ImageQuant TL software (v7.0). Average and standard error of means in the observed signal was determined for data from at least three biological replicates. A. Histogram depicting the frequency of all zebrafish 3 0 UI transcripts in Ensembl GRCz10 (with APPRIS annotation) as a measure of the distance of the 3 0 UI from the stop codon. Data are shown in 5 nts bins and bins beyond 500 nts are not shown. Bins of proximal 3 0 UI genes are in blue and distal 3 0 UI bins are in gray. Inset: Histogram of all zebrafish proximal 3 0 UI transcripts binned by 1 nt. B. PANTHER14.0 [87] gene ontology (GO) term enrichment analysis of proximal 3 0 UI-containing genes (top, shades of blue) and all 3 0 UIcontaining genes (bottom, shades of gray). All significant terms (Benjamini-Hochberg corrected p-value < 0.05) are shown for each set. C. A scatter plot showing gene-level fold change (FC) for transcripts with proximal 3 0 UI (dark blue: FC > 1.5 and light blue: FC < 1.5) and distal 3 0 UI (black: FC > 1.5 and gray: FC < 1.5) in rbm8a mutant embryos at 21 hpf compared to wild-type siblings. Genes encircled in red also contain a uORF as determined from a previously published dataset (see Materials and Methods). D. A scatter plot as in C showing fold changes of 3 0 UI-containing genes for 12 hpf upf1 morphants compared to wild-type control embryos. E. Integrated genome browser (IGV) screenshots of Sashimi plots showing RNA-seq reads observed for foxo3b, phlda2 and cdkn1ba in four zebrafish 24 hpf whole embryo RNAseq datasets (as labeled on figure in different colors) obtained from the DanioCode consortium. Range of the number of reads mapping to the genes are indicated to the left of each track in black. Number of junction reads are indicated at the spliced junction in the color corresponding to the specific track. In case of foxo3b, due to the length of the second intron the screenshots of the first two and the last two exons are shown separately. F. Empirical CDF plot showing the fold changes in magoh mutants (21 hpf) of genes upregulated that contain a proximal (green) or distal (light purple) 3 0 UI or intron-less genes (black). Genes that also contained an uORF were excluded from analysis. Kolmogorov-Smirnov (KS) test p-value for differences in fold changes between the two groups is indicated on the bottom right. G. CDF plot as in F showing the fold changes in rbm8a mutants (27 hpf) of genes upregulated that contain a proximal (green) or distal (light purple) 3 0 UI or intron-less genes (black). Kolmogorov-Smirnov (KS) test p-value for differences in fold changes between the two groups is indicated on the bottom right. (TIF) S5 Fig. Related to Fig 7. A. Histogram showing the frequency of all APPRIS 3 0 UI-containing transcripts in human GRCh38 as a measure of the distance of the 3 0 UI from the stop codon. Data are grouped in 5 nt bins from 1-500 nts. Proximal 3 0 UI-containing gene bins are indicated in blue; distal 3 0 UI-containing gene bins are indicated in gray. Red dotted line indicates distance from stop codon to farthest 3 0 UI = 50 nts. Histogram of transcripts from Ensembl annotation is shown in C. B. Histogram as in A of mouse proximal and distal 3 0 UI-containing transcripts in mouse GRCm38. Histogram of transcripts from Ensembl annotation is shown in D. C. Histogram as in A of proximal and distal 3 0 UI-containing human transcripts from Ensembl annotation of GRCh38. D. Histogram as in A of proximal and distal 3 0 UI-containing mouse transcripts from Ensembl annotation of GRCm38. E. PANTHER14.0 [86] gene ontology (GO) term enrichment analysis of human APPRIS proximal 3 0 UI-containing genes (shades of blue) and all human APPRIS 3 0 UI-containing genes (shades of gray). All significant terms (Benjamini-Hochberg corrected p-value < 0.05) are shown for each set. F. GO term enrichment analysis as in E of mouse APPRIS 3 0 UI-containing genes. G. A scatter plot showing gene-level fold changes for all Ensembl-annotated transcripts with proximal 3 0 UI (dark blue: FC > 1.5 and light blue: FC < 1.5) and distal 3 0 UI (black: FC > 1.5 and gray: FC < 1.5) in UPF1 knockdown human embryonic kidney cells (HEK293) compared to control cells using previously published data [57]. Genes encircled in red also contain a uORF as determined from a previously published dataset (see Methods). H. A scatter plot showing gene-level fold changes for all APPRIS-annotated transcripts with proximal 3 0 UI (dark blue: FC > 1.5 and light blue: FC < 1.5) and distal 3 0 UI (black: FC > 1.5 and gray: FC < 1.5) in UPF1 knockdown human embryonic stem cells (hESCs) compared to control cells using previously published data [62]. Genes encircled on red also contain a uORF as determined from a previously published dataset (see Materials and Methods). I. Scatter plot as in E showing gene-level fold changes of APPRIS-annotated mouse 3 0 UI-containing transcripts in Smg6 -/knockout mouse embryonic stem cells (mESCs) compared to wild-type cells [63]. J. Western blots showing representative protein knockdown in one replicate for Fig 7E. (TIF) Patton, Kiel T. Tietz, Natalie C. Deans, Ralf Bundschuh, Sharon L. Amacher, Guramrit Singh.