Natural variation of piRNA expression affects immunity to transposable elements

In the Drosophila germline, transposable elements (TEs) are silenced by PIWI-interacting RNA (piRNA) that originate from distinct genomic regions termed piRNA clusters and are processed by PIWI-subfamily Argonaute proteins. Here, we explore the variation in the ability to restrain an alien TE in different Drosophila strains. The I-element is a retrotransposon involved in the phenomenon of I-R hybrid dysgenesis in Drosophila melanogaster. Genomes of R strains do not contain active I-elements, but harbour remnants of ancestral I-related elements. The permissivity to I-element activity of R females, called reactivity, varies considerably in natural R populations, indicating the existence of a strong natural polymorphism in defense systems targeting transposons. To reveal the nature of such polymorphisms, we compared ovarian small RNAs between R strains with low and high reactivity and show that reactivity negatively correlates with the ancestral I-element-specific piRNA content. Analysis of piRNA clusters containing remnants of I-elements shows increased expression of the piRNA precursors and enrichment by the Heterochromatin Protein 1 homolog, Rhino, in weak R strains, which is in accordance with stronger piRNA expression by these regions. To explore the nature of the differences in piRNA production, we focused on two R strains, weak and strong, and showed that the efficiency of maternal inheritance of piRNAs as well as the I-element copy number are very similar in both strains. At the same time, germline and somatic uni-strand piRNA clusters generate more piRNAs in strains with low reactivity, suggesting the relationship between the efficiency of primary piRNA production and variable response to TE invasions. The strength of adaptive genome defense is likely driven by naturally occurring polymorphisms in the rapidly evolving piRNA pathway proteins. We hypothesize that hyper-efficient piRNA production is contributing to elimination of a telomeric retrotransposon HeT-A, which we have observed in one particular transposon-resistant R strain.


Analysis of small RNA data
The differential expression of piRNAs corresponding to the TE consensus sequencies (http://www.fruitfly.org/p_disrupt/TE.html) was performed by using the edgeR package (v. 3.16.5) [6] from R/Bioconductor (v. 3.3.2) [7]. The pairs of strong (Misy and w K ) and weak (Paris and Zola) R-strains were used as pseudo-replicates. FDR < 0.1 was chosen as a cutoff value.
The sequences of the longest 3'UTRs of known genic piRNA clusters [9] were extracted by using TxDb (version 3.2.2) and GenomicFeatures (version 1.24.5) packages of R/Biocoductor software [10]. The number of reads uniquely mapped to the retrieved 3'UTRs in strains R and RAL were RPM-normalized and used to create heatmaps. Nonparametric bootstrapping procedure for validation the strains' clustering was performed by using clusterboot method (clustermethod=hclustCBI, method="ward", B=1000, bootmethod="boot") implemented in fpc package (version 2.1-10) of R software.
The ping-pong signature was detected according to [11]. The global comparison of the piRNA ping-pong profiles in R strains with strain iso-1 was done by determination of the individual Spearman correlation coefficients of z-scored ping-pong signature for each TE family.
The obtained correlation coefficients were used for the construction of boxplots and testing the significance of difference of ping-pong profiles between R strains by Wilcoxon rank sum test.
To compare small RNAseq from ovaries and embryos we calculated TPM (transcripts per kilobase per million) values as the most stable unit used for comparison of expression across different samples.

Analysis of DNA-seq data
To compare the copy number of TEs in genomes of R-strains the reads from paired-end and mate-paired DNA-seq libraries after the removing of the adapter sequences were trimmed up to 50 nt and collapsed. Then the files containing reverse reads of paired-end or mate-pair libraries were reverse complemented and combined with the forward reads and the resulted reads were aligned to the canonical sequences of TEs (http://www.fruitfly.org/p_disrupt/TE.html) by bowtie with allowing of up to 3 mismatches. To construct the coverage of TEs by DNA-seq reads, the number of DNA-seq reads were counted in 100 bp bin intervals with 200 bp smoothing windows; the obtained RPM-normalized profiles were additionally smoothed by applying the cubic spline interpolation. To statistically test the differences of global TE copy abundances between genomes of R-strains or between paired-end and mate-pair datasets, the number of RPM-normalized and log 2 -transformed reads in bins were compared by the using the two-sided Wilcoxon rank sum test. The P-value = 10 -5 was chosen as the cut-off value.
The assembled genomic sequences of RAL strains from the DGRP project [12] were retrieved from the Drosophila Genome Nexus web-site (http://www.johnpool.net/genomes.html).
The assembling of draft genomes of Paris and Misy strains was performed with mate-pair DNAseq data by velvet assembler (-k 64) [13]. The N50s of the resulted contigs were 10738 bp for the Paris strain and 35726 bp for the Misy strain. The allelic variants were identified with Scipio software [14] by querying the largest isoform of piRNA pathway factors obtained from UniProt against assembled genomes of RAL and R strains.
The sequences of I-element fragments within piRNA clusters were determined by sequencing of PCR products obtained with cluster-specific primers (S4 Table, S2 and S3 Fig).
To obtain the genomic sequences of 3'UTR piRNA clusters, paired-end reads were mapped to dm3 genome by using bwa [15] followed by the removal of duplicates, indel realignment and quality score recalibration of aligned reads using the GATK software package [16]. The extraction of consensus genomic sequences from read pileups was performed by applying samtools [17] and bcftools software packages.The genomic sequence alignments of 3'UTR piRNA clusters and piRNA cluster fragments containing I-elements in Paris, Misy, Zola and cn bw; e strains was performed by using the MUSCLE software [18] and visualized as dotplots by using dotPlot function of seqinr R package (wsize = 5, wstep = 5, nmatch = 5) (v. 3.3.3) [10,19].