A Diversity of Conserved and Novel Ovarian MicroRNAs in the Speckled Wood (Pararge aegeria)

microRNAs (miRNAs) are important regulators of animal development and other processes, and impart robustness to living systems through post-transcriptional regulation of specific mRNA transcripts. It is postulated that newly emergent miRNAs are generally expressed at low levels and with spatiotemporally restricted expression domains, thus minimising effects of spurious targeting on animal transcriptomes. Here we present ovarian miRNA transcriptome data for two geographically distinct populations of the Speckled Wood butterfly (Pararge aegeria). A total of 74 miRNAs were identified, including 11 newly discovered and evolutionarily-young miRNAs, bringing the total of miRNA genes known from P. aegeria up to 150. We find a positive correlation between miRNA age and expression level. A common set of 55 miRNAs are expressed in both populations. From this set, we identify seven that are consistently either ovary-specific or highly upregulated in ovaries relative to other tissues. This ‘ovary set’ includes miRNAs with known contributions to ovarian function in other insect species with similar ovaries and mode of oogenesis, including miR-989 and miR-2763, plus new candidates for ovarian function. We also note that conserved miRNAs are overrepresented in the ovary relative to the whole body.


Introduction
MicroRNAs (miRNAs) are short regulatory RNAs with important roles in post-transcriptional regulation [1]. They are of significance at all stages during an animal's life cycle from embryonic development [2][3][4][5] to post-embryonic development [6,7] and reproduction [5,8]. Understanding the evolution of miRNAs is important to understanding the evolution of growth, reproduction and development.
It has been postulated that newly emergent animal miRNAs are generally expressed at low levels and with spatiotemporally restricted expression domains [9,10]. For example, in a study of miRNA expression in human and chimpanzee brains, more recently acquired miRNAs show a tendency for weaker expression than older miRNAs [10,11]. In mammals, a comprehensive study across multiple tissues demonstrates a positive correlation between miRNA age and expression levels [12].
The Speckled Wood butterfly (Pararge aegeria) is an ideal insect species with which we can address similar questions concerning miRNA evolution due to the availability of genome [13], transcriptome [14] and miRNA sequencing [15] data. Here we focus on miRNAs expressed during oogenesis, which are likely to include regulators of egg production and ovary function. Oogenesis in insects must generate a viable egg capable of initiating and sustaining embryonic development after fertilisation [16]. In addition to sustaining the energetic requirements for embryogenesis (17), the egg must also be pre-patterned with a polarity, which in insects occurs largely in the form of localised maternal determinants that drive zygotic spatiotemporal gene expression patterns essential for establishment of body axes [17]. Much is known about the identity and function of maternally provisioned mRNA molecules in insect eggs, but far less about other factors that influence the activity of such mRNAs, such as miRNAs. Furthermore, most of these studies have been performed in drosophilids [18], mosquitoes [19], cockroaches [20] and beetles [21]. Far less is known about the molecular control of oogenesis in butterflies and moths, despite Lepidoptera being one of the largest insect orders. Characterisation of the butterfly miRNA transcriptome would complement the current understanding of both genetic control involved in ovarian function [14] as well as the conservation of such mechanisms across insects.

Sample collection and sequencing
We sequenced the ovarian small RNA and mRNA transcriptomes of two females from each of two P. aegeria populations kept as lab stocks: (1) Zonza on Corsica, France, and (2) St Hubert, from the south of Belgium. The St Hubert population is the same population for which phenotypic data has previously been collected on maternal effects and egg production [22] as well as ovarian transcriptomic data [14] and for which a draft genome has been characterised [13]. The Zonza stock was newly established in 2014; the St Hubert stock has been in the lab since 2007 (see Carter et al. [14] for details regarding rearing P. aegeria in general and for St Hubert specifically).
Fully eclosed females were allowed to mate, and were placed individually in netted cages (0.5 m 3 ) along with a potted Poa trivialis grass plant for oviposition and an artificial flower containing a 10% honey solution. Females (kept at 21±0.3°C, LD 16:8) were sacrificed 6 days after mating, when they reached the peak of egg production [22]. Ovaries were dissected and cleared of fat and surrounding tissue (cf. Carter et al. [14]). Total RNA was extracted using TRIzol (Life Technologies). Total RNA samples and analysed by Illumina HiSeq 2000 sequencing at the Beijing Genomics Institute. Each sample was used to generate both a mRNA transcriptome library as well as a small RNA library. The mRNA transcriptome sequencing used paired-end reads of 2 x 100 nucleotides from a 160 bp insert library.

miRNA identification
Small RNA data processing was carried out as previously described [15], except that instead of annotating known miRNAs by comparison to miRBase, precursors were initially scanned by BLASTn against the set of previously reported P. aegeria miRNAs [15]. The dataset used in the previous study [15] was generated by pooling RNA in equal amounts from various life-cycle stages of the Speckled Wood butterfly, thereby eliminating potential bias from any one tissue. Sequences without hits against this dataset were subject to additional processing to remove hits against repetitive elements and non-coding RNAs, as well as precursors with reads mapping only to one arm and those with <95% of reads on either arm with the same 5' starting position. Only sequences with >50 dominant arm reads in Zonza or >18 dominant arms in St Hubert were retained for annotation. The introduction of a minimal read count is because in this study we are focussing on miRNAs with putative ovarian function, whereas in Quah et al. [15] we were searching for all miRNAs; the minimal read counts chosen for the two samples are equivalent after compensating for relative miRNA sequencing depth. All shortlisted candidates were then compared against miRBase 20 to identify known miRNAs which were not have been found in the original P. aegeria miRNA survey [15]. The same cutoff values described above were used to determine if a known miRNA was expressed in both ovarian samples.
Read counts for each mature product, as generated by miRDeep2 [23], were cross-checked by running the quantifier.pl script in the miRDeep2 package, using known P. aegeria miRNA precursors for read mapping. Both approaches generated similar read count data. Raw read counts for the Zonza and St Hubert samples, as well as for the original pooled dataset, were normalised against the total number of miRNA reads in each sample.

Transcriptome assembly
To assemble a reference ovarian mRNA transcriptome, raw RNAseq reads were processed at source for removal of low quality reads and adaptor trimming. Read quality was then validated by checking FASTQ data using FASTQC (http://www.bioinformatics.babraham.ac.uk/ projects/fastqc) and the first 16 bases trimmed off all reads. Reads from Zonza and St Hubert were pooled and assembled using the Trinity assembler (version: trinityrnaseq_r20140717) [24] to generate a combined P. aegeria ovarian transcriptome assembly containing 54,861 contigs. Contig FPKM (Fragments Per Kilobase of exon per Million of fragments mapped) values for each sample were determined using RSEM [25]. In order to retain only biologically interpretable contigs, we employed a multi-step approach to annotation. Contigs were first screened by BLASTn against the nr database on NCBI to reduce the number of misassembled contigs in the dataset. Remaining contigs were then compared by BLASTn against the P. aegeria ovarian transcriptome reported by Carter et al. [14]. A referenced list of proteins with known roles in ovarian and egg function [14] was also used to screen the combined transcriptome assembly by tBLASTn. All BLAST searches were subject to an E-value cutoff of 1e-05. Only contigs with hits to one or more of these datasets were retained, leaving 17,636 annotated contigs (S1 Table) which were further characterised using Blast2GO [26]. This involved screening contigs by BLASTx followed by mapping and annotation. InterProScan and ANNEX were used to augment annotation of GO terms. GO terms were condensed using the generic GO Slim subset.
Raw data for both miRNA and total RNA sequencing are available via the Gene Expression Omnibus (accession: GSE72090) along with the combined ovarian transcriptome assembly.

Statistical analyses
Total read counts (S2 Table) for miRNAs in each of the datasets were used for statistical analysis. miRNAs were designated 'lineage-specific' if found only in P. aegeria and 'conserved' if present in any other species. A two-tailed Mann-Whitney U test was conducted to determine if expression levels differed significantly between conserved and lineage-specific miRNAs in any of the datasets. We also conducted a Spearman's rho analysis to examine correlation between miRNA age and expression level.

Small RNA sequencing and data processing
We constructed and sequenced small RNA libraries from ovaries of the Speckled Wood butterfly P. aegeria to identify miRNAs expressed in the lepidopteran ovary. To allow for possible differences dependent on geography, climate or diet, we analysed females from two distinct populations from different parts of Europe: St Hubert in Belgium and Zonza in Corsica. Total RNA was extracted from ovaries and used to construct small RNA libraries for each sample. A total of 46,481,949 Illumina HiSeq reads were obtained (22,604, 869 for Corsica and 23,877,080 for Belgium). After filtering reads to remove possible degradation products and non-miRNAs, these were compared to a published dataset of P. aegeria miRNAs [15].
Of the 139 previously known miRNAs in P. aegeria [15] we obtained reads for 80 and 76 miRNAs in the ovary-specific samples for St Hubert and Zonza respectively (S2 Table). Some of these were expressed at low levels and should not be considered as candidates for ovarian function; using read-count cut-off thresholds, we reduced this number to 58 and 57 in St Hubert and Zonza respectively. Of these, 52 known miRNAs are expressed above cutoff threshold in both populations (Fig 1, S2 Table). In addition, a total of 11 miRNAs that had not been previously described in P. aegeria were found; 2 were present in the Zonza sample only, 6 in the St Hubert sample only, and 3 in both (B-0070/C-0072, B-0210/C-0160 and B-0401/C-0173; Table 1). The total number of miRNAs identified as expressed in P. aegeria ovary is therefore 74 (63 previously described miRNAs and 11 newly discovered), with a total of 55 miRNAs expressed in the ovary in both populations.

Effect of miRNA age on expression level
We categorised the miRNAs present in the updated P. aegeria dataset by age, defining 'conserved' miRNAs as those present in P. aegeria and at least one other species and 'lineage-specific' miRNAs as those not found (or predicted in genome sequence) outside of the Speckled Wood. To test if the 11 newly discovered miRNAs are encoded by lineage-specific genes, or are evolutionarily older, each was investigated by using BLASTn against all known miRNA precursors on miRBase and their inferred pre-miRNA sequences were compared by BLASTn against the available genomes of other Lepidoptera (Heliconius melpomene, Danaus plexippus, Manduca sexta, Bombyx mori, Plutella xylostella and Cameraria ohridella) as well as that of the caddisfly (Glyphotaelius pellucidus). One sequence, B-0401/C-0173, was orthologous to miR-2763 (S1 Fig), which likely originated in the common lepidopteran ancestor [15], but was not  Table 1. miRNAs identified in this study which were not found in a whole body pooled transcriptome reported earlier [15]. previously identified in a whole body small RNA library for P. aegeria pooled across multiple developmental stages including adult females [15]; the other 10 sequences are thus far specific to P. aegeria and inferred to have arisen more recently. Combining these data with the 139 known P. aegeria miRNAs and their times of origin gives a total of 83 conserved and 67 lineage-specific miRNAs in this species.
The total number of reads mapped to each miRNA precursor was taken as a measure of expression level and compared between conserved and lineage-specific miRNAs in St Hubert, Zonza, and the pooled miRNA transcriptome reported in Quah et al [15]. There is a large spread of expression levels detected for both conserved and lineage-specific miRNAs in each of the three datasets. A significant difference in median expression levels was observed between conserved and lineage-specific miRNAs (two-tailed Mann-Whitney U test; U(148) = 802.5 (P << 0.001; pooled sample), U(148) = 2134 (P < 0.05; St Hubert), U(148) = 1798 (P < 0.05; Zonza)) in each of the transcriptomes (Fig 2). In addition, the miRNAs with the highest expression levels in each dataset are all members of deeply conserved miRNA gene families, and there is a statistically significant correlation between miRNA age and expression level (Correlation analyses, Spearman's rho = 0.582 (P << 0.001; pooled sample), 0.221 (P < 0.05; St Hubert), and 0.320 (P << 0.001; Zonza)).

Older miRNAs are enriched in the ovary
We studied relative proportions of conserved and lineage-specific miRNAs expressed in the ovary and compared these to the proportions observed in the pooled transcriptome (Fig 3). Ovarian expression of each miRNA was defined by its detection above cut-off levels in both populations. There is a higher proportion of conserved miRNAs (64%) expressed in the ovary compared to that in the pooled miRNA transcriptome (47%, chi-square P < 0.05).

Ovarian-specific and ovarian-upregulated miRNAs
The most abundant miRNAs identified in the ovary are (in decreasing order of abundance) miR-989, miR-263a and miR-1. This observation is consistent between St Hubert and Zonza. Other consistently highly-expressed ovarian miRNAs include miR-993 and miR-184 (Table 2). In the pooled whole-body miRNA transcriptome, miR-1 is the most abundantly expressed sequence, followed by miR-989 and let-7; miR-263a and miR-993 are relatively highly expressed in the pooled transcriptome.   The above analyses reveal which miRNA genes are highly expressed in ovary, but this list includes miRNAs expressed in a range of tissues, as well as ovarian-upregulated sequences. To gain insight into the role of miRNAs in ovarian function, we wished to identify the latter category of miRNAs. Mature and dominant arm read counts for all miRNAs were normalised against the total number of miRNA reads per sample, and then compared with normalised read counts from the pooled miRNA dataset for P. aegeria from Quah et al. [15]. miRNAs with dominant arm read counts below a cutoff value in the Belgian and Corsican samples were deemed to not be expressed and excluded from analysis. These analyses identified five miRNAs with > 4-fold upregulation in both libraries (Table 3): Pae-miR-989, Pae-miR-263a, Par-247, Par-340 and Par-341. An additional 13 miRNAs showed up-regulation in just one ovary sample, or were only detected in either ovarian dataset but not in the pooled miRNA transcriptome (Fig 4, S2 Table). As Par-340 and Par-341 share 3' arm reads, we group them together (Par-340/341) for the purpose of analysis. These four 'ovarian-upregulated' miRNA sequences can be combined with the three newly identified 'ovary-specific' miRNAs common to both samples (described above), to form a category of seven 'ovary set miRNAs' for further analysis.

Identification of putative ovarian miRNA targets
To identify possible mRNAs which may be regulated by the ovary-set miRNAs, we ran target predictions for these miRNAs against ovarian transcriptomes constructed from the same P. aegeria total RNA extraction. Reads from both samples were pooled to construct a combined transcriptome assembly ( Table 4, S1 Table).The PITA target prediction algorithm [27] was used to identify potential targets in the transcriptome for both 5' and 3' mature products for each of the 7 'ovary-set' miRNAs.
We then sought to identify specific miRNA-mRNA target pairs of interest within the dataset. Predicted target contigs corresponding to genes involved in ovarian and egg function [14] are listed in Fig 5. These contigs cover a variety of functions including regulation of egg chamber development, oocyte provisioning and defence against pathogens and the external environment. In many cases, multiple 'ovary set' miRNAs are predicted to target the same transcript, an example of which is neuroglian (nrg) involved in regulation of border cell migration in flies [28].

Diversity of miRNAs in Pararge aegeria
In a previous study, 139 miRNA genes were identified in the Speckled Wood butterfly, P. aegeria [15]. Through deep sequencing of small RNAs in adult ovaries from two geographically-distinct populations, we have now increased this number to 150 expressed miRNA genes. Of the 11 new miRNAs identified in the ovarian small RNA transcriptome, six are uniquely expressed in the St Hubert sample, two are unique to Zonza, and three were found in both populations.
A common set of 55 miRNAs are expressed in the ovaries of both St Hubert and Zonza. The most highly expressed of these include miRNAs also known to be highly expressed in other insects, such as miR-989 and miR-1. miR-1 is highly conserved across bilaterians [29] and has documented expression in muscle in vertebrates [30,31] and insects [32]. Its expression is also known to be highly abundant in the ovaries of the cockroach Blatella germanica [33]. miR-989 is known to be expressed in the ovary in mosquitoes [19] and flies [34].
Twenty-one of the 150 miRNA genes in the P. aegeria miRNA transcriptome are upregulated at least 4-fold in, or are specific to, the ovary in at least one sample. Seven of these are either upregulated or ovary-specific in both populations. We refer to these seven genes as the 'ovary set' miRNAs.

miRNA age, ovary expression and expression level
New miRNAs tend to be expressed at low levels and in restricted spatiotemporal domains [9,10]. These characteristics would avoid severe fitness penalties due to the minimal requirements for miRNA-target binding and the resultant large number of potential targets in the transcriptome. Over time, deleterious target sites would be purged from the genome and advantageous target sites preserved, enabling the miRNA to increase its expression level and relax its tissue specificity.
Here we utilise our characterisation of the butterfly ovarian miRNA transcriptome, combined with our previous analysis of a pooled whole body and developmental miRNA transcriptome, to examine expression levels of conserved and lineage-specific miRNAs, as well as their relative contribution to the ovarian miRNA transcriptome. We consider separately the two ways in which overall miRNA expression might be increased-breadth of expression domains and level of transcription. Based on the model described above, we would predict that conserved miRNAs exhibit overall higher expression levels in the pooled transcriptome due to a wider range of expression domains. If evolutionarily-younger miRNAs were more tissue-specific miRNAs, these might be predicted to be under-represented in the pooled transcriptome, and reads might even be missed due to dilution effects.
We observed a larger proportion of conserved miRNAs expressed in the ovary compared to the pooled whole-body transcriptome, somewhat at odds with the predictions made by the model. One possible explanation for this is that the whole-body transcriptome includes RNAs from embryonic stages. It has been suggested that the early embryo represents a permissive environment for newly acquired miRNAs to be expressed and exert their effects [35]. Increased representation of young miRNAs in a library comprised of pooled developmental stages may result from such miRNAs being expressed during embryogenesis. An alternative explanation is that the capacity of a particular tissue to tolerate experimentation with novel miRNA expression may vary between tissues; tissues with highly complex transcriptomes such as the primate brain, which provides strong evidence for the above model [11], may differ in their robustness and resultant response to new miRNAs compared to those with simpler transcriptomes. Thus, it is possible that while some tissues express a large number of lineage-specific (young) miR-NAs, new miRNAs arising under restricted transcriptional control in less permissive tissues are more likely to be eliminated by selection.
Conserved miRNAs are also predicted to display higher expression levels than lineage-specific miRNAs. A seminal study in mammals [12] reported a striking correlation between expression levels of miRNA families and age. In agreement with this report, deeply conserved miRNAs have the highest expression levels in all our datasets and a significant positive correlation exists between miRNA age and expression level. We also observe that both conserved and young miRNAs possess a wide range of expression levels, and that these ranges overlap substantially with each other. The observation that older lepidopteran miRNAs tend to exhibit higher expression levels is in line with the model of Chen & Rajewsky, and also replicates findings observed in mammals [12]. Our finding that some evolutionarily young miRNAs can also be expressed at high levels may also be reconciled with predictions of the model. Given the constant high turnover of young miRNAs in metazoan genomes [12,15,36], deep sequencing of a particular tissue presents a snapshot of the transcriptome at a given point in time. It is unlikely that there is any mechanistic bias behind miRNA birth to predispose newly-evolved miRNAs to emerge under the control of tissue-specific promoters with low expression levels, but it may be the case that such miRNAs are more likely to subsequently avoid elimination through natural selection.

Potential conserved ovarian functions for insect miRNAs
A combined transcriptome assembly for both populations was constructed and used for miRNA target prediction on mature 5' and 3' products of all seven 'ovary set' miRNAs. One of the newly identified miRNAs within the 'ovary set' is the Speckled Wood homologue of miR-2763, previously described in the moths B. mori and Manduca sexta [37]. Three potential targets were identified for Bmo-miR-2763 in the B. mori genome [38]. The first encodes the diapause hormone receptor (DHR), expressed in the silkmoth ovary [39], where it is involved in transduction of the diapause hormone signal and exerts regulation over embryonic diapause. The second predicted target, GRF, encodes an insect protein related to vertebrate germ cell nuclear factor (GCNF). [40]. GRF may function in ecdysteroid signalling in moths [40]. The third putative target, GATAβ, is implicated in choriogenesis [38]. Although these target predictions were not replicated in P. aegeria, this may be due to partial mRNA predictions in the ovary transcriptome data, so some conservation of function is still possible.
Another miRNA with possible conserved ovarian functions across a wide variety of insect species is miR-989: a likely insect-specific miRNA [41] and the most abundant miRNA in the butterfly ovary. We found expression of this mRNA to be upregulated > 5-fold in both butterfly ovary samples relative to the pooled sample, and similarly it is highly expressed in ovaries of several fly species [42]; it is also upregulated in mosquito ovaries in response to blood feeding [19]. The pupal phase, during which oogenesis occurs, coincides with increased expression miR-989 in B. mori [38], suggestive of a similar upregulation of this miRNA during egg production in the Lepidoptera. In Drosophila melanogaster [34], it was demonstrated that a loss of miR-989 function impairs border cell migration in the egg chamber, leading to sterility. miR-263a is the third 'ovary set' miRNA found in P. aegeria which has been characterised in a variety of other insect species. It has been hypothesised that miR-263a is involved in trans-generational immune priming in the greater wax moth Galleria mellonella [43]. It is strongly upregulated in G. mellonella pupae, and is also induced in the larval midgut, the rest of the body, and the eggs of females fed as larvae on diets contaminated with pathogenic bacteria. Our finding that miR-263a is highly expressed in the ovaries of P. aegeria lends support to the idea that a role in defence could be conserved across Lepidoptera, but more direct evidence is required to test this.

Candidate targets and their involvement in butterfly ovarian biology
Further insights into the role of each miRNA on ovarian biology can be made by bioinformatic analysis of a curated list of genes implicated in egg and ovary function [14]. In each case, we suggest that the miRNA of interest may be as fine-tuners of gene activity rather than on/off switches for gene expression. This would grant robustness to the system and buffer it against developmental perturbation [44,45].

Hormone signalling
The nuclear receptor E75 is an early response gene in the ecdysone signalling pathway and is involved in the ovarian response to hormone signalling in silkmoths [46] and flies [47]. We identified the E75 mRNA splice variant E75B as a potential target for two of our 'ovary set' miRNAs ( Fig 5). Both the predicted miR-989 and B-0210/C-0160 target sites are located at the start codon of this transcript.

Follicle cell function
Follicle cells perform several key functions during insect oogenesis including egg provisioning [48], secretion of maternal ecdysteroids [49], and production of the chorion and eggshell. Follicle cells in flies also provide spatial patterning information to the oocyte [50].
In Drosophila, a subset of follicle cells ('border cells') contribute to both anterior patterning as well as the formation of the micropyle through which sperm enter the egg at fertilization [51]. Several 'ovary set' miRNAs, including miR-989 as mentioned above, are predicted to target genes involved in follicle cell function and border cell migration. These include stardust, crumbs [52], neuroglian [28] and taiman (tai) [53].

Serosa and immune defence
We predict that miR-989 may target the mRNA of ShxD, a homeobox gene expressed maternally in ovarioles and zygotically in the serosa [13]. The Shx genes are implicated in specifying the location of the extra-embryonic serosal tissue domain in developing oocytes, as well as regulating its subsequent development. As noted above, miR-263a may also be involved in immune defence, through different targets.