Heat shock in C. elegans induces downstream of gene transcription and accumulation of double-stranded RNA

We observed that heat shock of Caenorhabditis elegans leads to the formation of nuclear double-stranded RNA (dsRNA) foci, detectable with a dsRNA-specific monoclonal antibody. These foci significantly overlap with nuclear HSF-1 granules. To investigate the molecular mechanism(s) underlying dsRNA foci formation, we used RNA-seq to globally characterize total RNA and immunoprecipitated dsRNA from control and heat shocked worms. We find a subset of both sense and antisense transcripts enriched in the dsRNA pool by heat shock overlap with dsRNA transcripts enriched by deletion of tdp-1, which encodes the C. elegans ortholog of TDP-43. Interestingly, transcripts involved in translation are over-represented in the dsRNAs induced by either heat shock or deletion of tdp-1. Also enriched in the dsRNA transcripts are sequences downstream of annotated genes (DoGs), which we globally quantified with a new algorithm. To validate these observations, we used fluorescence in situ hybridization (FISH) to confirm both antisense and downstream of gene transcription for eif-3.B, one of the affected loci we identified.

(e) cowplot v0.9.3 (https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html) 22 (f) magick v1.9 (https://cran.r-project.org/web/packages/magick/index.html) 23 (g) here v0.1 (https://cran.r-project.org/web/packages/here/index.html) 24 Dogcatcher is available at...   Input RNA was rRNA depleted before sequencing. However, due to low RNA yields, J2 immunoprecipitated RNA was 58 not rRNA depleted before sequencing. To account for differences in Input-RNA rRNA depletion efficiency, we used the 59 following subtraction procedure. Reads were mapped to a rRNA WS258.fasta genome created in biomart 60 (http://uswest.ensembl.org/biomart/martview/). Genes and DoGs were normalized using the Ribosomal subtraction ratio 61 derived from the equation below rRNA subtracted Library size for a sample (L) was calculated by taking input reads (I) minus the sum of the number of 63 uniquely mapped reads to the rRNA fasta (U) and the number of multi-mapped reads to the rRNA fasta (M). The 64 average library size (µ) is the sum of all the rRNA subtracted libraries divided by the amount of samples (n). The per 65 sample ribosomal subtraction ratio (RSR) is calculated by dividing L by (µ). The rRNA normalized gene or DoG (G) is 66 found by dividing the gene or DoG (g) by the RSR. Finally, we round g up to get an integer input for DESeq2. It is 67 important to note that this equation was applied separately for Heatshock vs WT and tdp-1(ok803) vs WT, as well as 68 within these groups separating input and J2 samples.
We used the likelihood ratio test (LRT) option in DESeq2 for all analysis of J2 enrichment from input RNA. The LRT is 71 used to find the difference between a full and reduced model. 72 full model = assay + condition + assay:condition 73 reduced = assay + condition 74 using the heat shock analysis as an example, the interaction term 'assay:condition' represents the ratio of the ratios: The model will then show which genes are J2 enriched in heat shock compared to no heat shock. We did the same for 79 tdp-1(ok803).

80
In this study, all enrichment differential expression was done with the following commands where the assay is enrichment 81 and condition is heat shock, tdp-1(ok803) , or wild type worms. Four samples were used in the analysis. Two heat shock (SRX1932621, SRX1932622) and two with no heat shock 89 (SRX1932620, SRX1932619). All samples were processed similarly to our samples except for the DESeq2 step. Since there 90 was no enrichment procedure we used the standard Dogcatcher algorithm and additional differential expression pipeline 91 using the DESeq2 commands below. Dogcatcher uses a sliding window approach to calculate coverage. Bedgraph sense or antisense reads are assigned to these 97 coverage windows and if coverage is above a set percentage threshold they will continue (meta read-through) or stop at 98 the next gene on the same strand (local read-through). Downstream of gene transcripts (DoGs) use sense reads to the 99 gene and slide 5' to 3' starting from the 3' annotated end. Antisense downstream of gene transcripts (ADoGs) use 100 antisense reads to the gene and slide 5' to 3' starting from the 3' annotated end. Previous of gene transcripts (PoGs) use 101 sense reads to the gene and slide 3' to 5' starting from the 5' annotated end. Antisense previous of gene transcripts 102 (APoGs) use antisense reads to the gene and slide 3' to 5' starting from the 5' annotated end. PoGs are removed if they 103 overlap a DoG. APoGs are removed if they overlap any ADoGs. ADoGs and APoGs are removed if they overlap DoGs, 104 PoGs, or genes on the opposite strand. For improved normalization in DESeq2, non-significant genes are added when 105 calculating differential expression. These non-significant genes are then removed directly after.

106
The J2 enriched version of Dogcatcher performs the Ribosomal Subtraction Ratio normalization on counts for RSubread 107 as well as the LRT method from DESeq2 on normalized counts.

108
Antisense ratio for treatment over wildtype in input RNA(Fig3) 109 Counts were generated using RSubread. All genes were filtered to have a base mean over 20 in sense and antisense counts 110 and log transformed. An antisense over sense ratio was then made for each condition. After this, a ratio was made of 111 treatment over wild type. The HYPGEOM.DIST equation was used in excel with the following parameters and the cumulative distribution set to 114 false. Base mean, FDR, and log2 fold change is from the output of DESeq2 from sense and antisense for treatment over 115 wild type. Operons were obtained from wormbase WS258.gff3 by writing out lines that contained "operon" and not 125 "deprecated operon". 1388 operons were then matched to the longest DOGs and DOGs were removed if they had any 126 overlap with operons on the same strand. Overlap was found by comparing the start or end coordinates of operons to the 127 start or end of the longest DOG and removing if they were on the same strand. This filtering removed 67 and 36 Runon 128 or operon overlaps from Heatshock and tdp-1(ok803) vs Wildtype comparisons. Similarly, ADOGs were removed if they 129 had an operon overlapping on the opposite strand, none of the ADOGs in this study had operons on the opposite strand. 130

131
Terminal inverted repeats (TIR) were obtained from the WS258 tandem and inverted repeats gff. We first counted the 132 amount of TIRs with any overlap to a DoG on the positive strand and divided by the length of the DoG. Next, we 133 generated a background of downstream intergenic sequences for every gene on the positive strand. For the size of our 134 random downstream intervals, we sampled without replacement from the length distribution of our significantly J2 135 enriched or depleted heat shock DoGs. If a random interval was larger than the distance to the next downstream gene the 136 random interval was stopped at the next gene. We next sampled 50 of our random intervals and computed the mean 137 10,000 times. Finally, we used a student's T-test and found significant differences (Padj < 0.016) comparing J2 enriched, 138 J2 depleted, and non-significant percent coverage to random downstream intervals. We also found the T-test significant 139 when comparing non-significant DoGs to J2 enriched and J2 depleted DoGs.

141
The python package mygene was used to get entrez id's for each gene or DoG for input into goatools. For obtaining genes 142 inside of significant (fdr <0.05) GO terms related to translation we first obtained every significant GO term that 143 contained the word "translation", "ribosomal", and "ribosome". We then took all of the significant genes and removed 144 duplicates that belonged to multiple GO terms. Since goatools does not provide a background gene list for worms, one was 145 created by downloading all genes with the taxid 6239 and creating a dictionary similar to the ones made in the goatools 146 test data folder. You can download the list from NCBI with the link in genes NCBI 6239 marko ALL.py and it will create 147 the dictionary when imported into another python script. Genes with overlap from DoGs on the opposite strand were 148 obtained from the "combined DOG with biotypes UNPACKED.csv" output from Dogcatcher. These were then combined 149 with ADoGs to find all genes that could have significant antisense transcription starting from outside of the gene region. 150 All genes, DoGs, PoGs, ADoGs, and APoGs, were filtered for a significance of padj < 0.05 and were considered "up" in 151 treatment vs control if they had a LFC > 0 and "down" in treatment vs control if they had a LFC < 0.