miRNA Repertoires of Demosponges Stylissa carteri and Xestospongia testudinaria

MicroRNAs (miRNAs) are small regulatory RNAs that are involved in many biological process in eukaryotes. They play a crucial role in modulating genetic expression of their targets, which makes them integral components of transcriptional regulatory networks. As sponges (phylum Porifera) are commonly considered the most basal metazoan, the in-depth capture of miRNAs from these organisms provides additional clues to the evolution of miRNA families in metazoans. Here, we identified the core proteins involved in the biogenesis of miRNAs, and obtained evidence for bona fide miRNA sequences for two marine sponges Stylissa carteri and Xestospongia testudinaria (11 and 19 respectively). Our analysis identified several miRNAs that are conserved amongst demosponges, and revealed that all of the novel miRNAs identified in these two species are specific to the class Demospongiae.


Introduction
MicroRNAs (miRNAs) are small non-coding RNAs of about 22 nucleotides in length (Ha and Kim [1] details the biogenesis of these molecules). They play key roles in many physiological processes, such as development, cell proliferation and stress response, mainly through posttranscriptional degradation and translational repression of their mRNA targets [2,3]. While studies have mainly focused on identifying novel miRNAs and uncovering their biological function in model metazoans, the ubiquity of next-generation sequencing and the completion of several invertebrate genomes have resulted in miRNAs being identified in more basal metazoans, e.g. in the sponge Amphimedon queenslandica [4] the cnidarians Stylophora pistillata [5], Nematostella vectensis [4,6] and Hydra magnipapillata [7].
In this study, we identified the core RNAi proteins involved in the biogenesis of miRNAs in the draft proteome of the marine sponges Stylissa carteri and Xestospongia testudinaria from the Red Sea (Ryu et al., in review). The presence of a functional RNAi machinery lends more support to predicted miRNAs having important roles in both organisms. In order to identify bona fide miRNAs present in both sponges, we sequenced the small RNA fraction of both sponges and mapped the reads to the respective draft genomes (Ryu et al., in review). These miRNAs were subsequently compared to the previously described small RNAs set from Amphimedon queenslandica [4]. On top of identifying novel miRNAs for both S. carteri and X. testudinaria, by means of comparative analysis, we identified a subset of miRNAs that are conserved between the three studied poriferans. We believe our work would prove to be a valuable resource in understanding the role of miRNAs in the more basal branches of metazoans, and provide clues to why none of the poriferan miRNAs were conserved in other metazoans.

Ethics Statement
No special permissions were required for the collection of our sponges. Our research did not involve endangered or protected species.

Identification of Core RNAi Proteins
The sequences of six protein families related to the small RNA pathways (Argonaute, Dicer, Drosha, HEN1, Pasha, and Piwi) were retrieved from the UniRef database [8] for five organisms (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, and Schizosaccharomyces pombe). Sequences were further clustered using CD-HIT [9] at a threshold of 90% sequence identity with no length restrictions across species boundaries to remove fragmented protein sequences. The remaining representative sequences (46 in total) were queried against the predicted proteomes of S. carteri and X. testudinaria (Ryu et al., in review) to identify potential sponge homologues (BLASTP, e-values < 10 −10 ). These homologues were then searched for protein domains using InterProScan v5 [10]. The domains deemed essential for the RNA machineries were as follows: the PAZ (PF02170) and Piwi (PF02171) domains for Argonaute and Piwi; a pair of RNase III domains (PF00636) for Dicer and Drosha; a double-stranded RNA binding domain (PF00035) for Pasha; and a methyltransferase domain (PF13847) for HEN1. Candidate homologues lacking these domains were discarded. Additional support for the identities of the candidate homologues was obtained by carrying out reciprocal BLASTP searches of the translated candidates against all proteins in the Swiss-Prot database [11] (S1 File).
Phylogenies were constructed over entire sequences for each of the six protein families to provide additional support for the presence of the RNAi machinery in the sponges. For each protein family, candidate homologues from the sponges were aligned with selected metazoan sequences from [5] using MUSCLE [22]. Aligned regions of low quality were removed with TrimAl using the built-in "gappyout" parameter [23] (S2 File). ProtTest3 [24] was used to determine the best model for amino acid substitutions, while MEGA v6 [25] was used to construct maximum-likelihood phylogenetic trees (applying the substitution model of "LG+G" for Argonaute, Drosha, and HEN1; and "LG+I+G" for Piwi, Dicer, and Pasha). Support values were computed from 1,000 bootstrap replicates.

Searches for Other RNAi Proteins
We broadened our search to include four other RNAi proteins previously implicated in miRNA biogenesis in some metazoans: HYL1, Serrate, GW182, and RdRP (RNA-dependent RNA polymerase). Sequences for HYL1, Serrate and GW182 (13 in total) from four cnidarians (Acropora digitifera, Acropora millepora, Hydra vulgaris, and Nematostella vectensis) were obtained from Moran et al. [13]. For RdRP sequences (7 in total), they were retrieved from UniRef for the same five organisms detailed previously, analogous to how core RNAi protein sequences were obtained. These sequences were subsequently searched against the proteomes of S. carteri and X. testudinaria. As these RNAi proteins play less crucial roles, we did not impose the same strict requirements for the identification of homologues (e.g. presence of key domains, conservation of crucial amino acid residues).
Similarly, phylogenies were constructed by aligning and trimming low-quality sequences for Serrate and GW182 proteins (S3 File). The "JTT+G" and "LG+G" models respectively were used to construct the trees.  Table 1). The specimens were washed three times in calcium-magnesium-free artificial seawater (CMF-ASW) and subsequently flash frozen in dry ice. Small RNAs were extracted with the mirVana miRNA Isolation Kit (Ambion, Austin, TX). RNA quality was assessed using a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA). Small RNA libraries were prepared with TruSeq Small RNA protocol (Illumina, San Diego, CA) on targeted size ranges (18-40 and 18-70 bp). Low-quality ends (Phred score < 20), sequencing adapters, and short reads (< 16 bp) were removed using custom Java scripts.

Processing of Small RNA Reads
Due to the significant sequencing depth of the small RNA libraries obtained from both sponges (52.1 and 36.2 million reads for S. carteri and X. testudinaria, respectively), several filters were applied to remove sequences that might result from sequencing errors. First, reads that occurred less than five times in each library was removed. This step removed 8.8 million reads (17.0%) for S. carteri, and 11.5 million reads (31.7%) for X. testudinaria. A BLASTN search of the remaining reads was performed against Rfam v11 [26], a database of known noncoding RNA families. Reads that matched (e-value < 10 −5 ) a known noncoding RNA family-most commonly to tRNAs and rRNAs-were excluded from further analyses. The removed reads included 49 "miRNA-like" reads (9 similar to let-7, and 40 similar to miR-10); however, none mapped to a genomic sequence that folded into a hairpin structure, indicating that they could be disregarded. This step removed a further 6.2 million reads (11.8%) from S. carteri and 3.2 million reads (9.7%) from X. testudinaria.
miRNA Prediction and Filtering miRDeep2 [27] was used to identify miRNA genes by mapping the small RNA reads for each sponge to their respective genomes using Bowtie v1.0.1 [28]. Potential pre-miRNA precursor sequences were identified by their mapping patterns and validated using RNAfold to confirm that they have the canonical hairpin loop structure [27]. Each predicted pre-miRNA that had a miRDeep2 score > 5 was retained for further analysis and inspected manually. A Python script was written to produce additional information not found in the miRDeep2 output (i.e., the length of the 3' overhang, and the proportion of reads with a consistent 5' end). Based on the output, we further selected a set of bona fide miRNAs. Conserved miRNAs were identified using BLASTN against all previously identified pre-miRNA sequences in miRBase v20 [29].

Identification of Core RNAi Proteins
Six key proteins that are essential for miRNA processing and function were identified in both S. carteri and X. testudinaria-both sponges had homologues corresponding to one Argonaute, one Piwi, two Dicers, and one Drosha; S. carteri had one additional Pasha homologue, while X. testudinaria had one additional HEN1 homologue. These homologues were identified via an initial BLASTP search against known RNAi proteins, followed by the checking of protein domains crucial for catalytic activity, and lastly a reciprocal BLAST search against manually curated proteins in Swiss-Prot (S1 File).
The per-family alignments of candidate homologues against the known RNAi proteins showed that many of the functionally important amino acid residues within the key protein domains were conserved in the candidate homologues obtained from the two sponges. Examples include the strong conservation of charged amino acids (arginine, lysine, and glutamate) in the PAZ domain and the DDX triad in the Piwi domain of the Argonaute and Piwi homologues, as well as the aspartate and glutamate residues in the Dicer and Drosha homologues (S1, S2, S3, S4, S5 and S6 Figs). Maximum-likelihood phylogenetic trees for all six protein families (Fig 1A-1F) placed all of the candidate S. carteri and X. testudinaria homologues with those from either A. queenslandica or Ephydatia fluviatilis, strongly indicating that the RNAi machineries identified in both sponges are bona fide.
In addition to the core small RNA pathway proteins, we searched for homologues of HYL1, Serrate, GW182, and RdRP in both sponges. HYL1 and Serrate are known to be involved in converting pri-miRNAs to pre-miRNAs in plants [30], but full-length homologues of both proteins have also been reported in four cnidarians (A. digitifera, A. millepora, H. vulgaris, and N. vectensis) [13]. Homologues of GW182, which help Argonaute repress its targets [31], have also been identified in these four cnidarians. RdRPs use small RNAs as templates and contribute to amplifying their silencing effects by directing the production of secondary dsRNAs [32].
To date, functional RdRPs have been identified in plants [33,34], C. elegans [35], and N. vectensis [36], but not in mammals nor flies. In the present work, we identified three Serrate homologues in S. carteri, one Serrate homologue and two GW182 homologues in X. testudinaria, and one Serrate homologue and one GW182 homologue in A. queenslandica. With the exception of one GW182 homologue in X. tetsudinaria, all of these homologues clustered best with other sequences of sponge origin (S7 Fig). We failed to find any homologues for HYL1 or RdRP in the three sponges, nor did we find homologues of GW182 in S. carteri (BLASTP evalue < 10 −10 ; data not shown). The absence of HYL1 and RdRPs in sponges hints at the selective loss of these two protein families after the divergence of the phyla Porifera and Cnidaria.

Identification of Bona Fide miRNAs
Roughly half of the filtered reads were successfully mapped to the corresponding genome (20.5 million reads out of 43.3 million for S. carteri; and 13.0 million reads out of 24.7 million reads for X. testudinaria). Based on these reads, 13 miRNA genes were predicted for S. carteri, 11 of which passed our additional filtering criteria; and 46 miRNA genes were predicted for X. testudinaria, with 19 passing the additional criteria (Table 2; S1 and S2 Tables). Out of the combined 30 miRNA predictions, 8 from S. carteri and 5 from X. testudinaria were found to match miRNAs that had been previously identified in other demosponges (A. queenslandica [4]; Haliclona sp. [6]; and Dysidea camera, Clathria prolifera, and Suberites sp. [37]). Like the other demosponges, all of the predicted miRNAs in S. carteri and X. testudinaria were class-specific; there was also no overlap between our predictions and the recently-discovered miRNAs in Calcisponges [38].
Comparisons between the three demosponges with complete pre-miRNA sequences (i.e. both the mature and the star sequences) reveal the strong conservation of miRNA repertoires across the three sponges: seven of the eight miRNAs identified in A. queenslandica had homologues in either S. carteri or X. testudinaria (Fig 2). Of the seven, five (miR-2015, miR-2016, miR-2019, miR-2020, miR-2021) were present in both sponges; the other two (miR-2017, miR-2018) were present only in S. carteri but not in X. testudinaria. For these homologues, the mature sequences are almost always fully conserved, with at most a one-base mismatch between species; the star sequences have, as expected, more mismatches, which is in line with the current understanding that star miRNAs are less functionally constrained.
It is interesting to note that the co-expression pattern of the mature reads from the two arms of aqu-miR-2015 (~2,700 reads mapped to each arm of the miRNA hairpin [4]) differed between S. carteri and X. testudinaria. For S. carteri, the 5' and 3' arms were mapped by 34,026 and 2,283 reads, respectively, while for X. testudinaria, the 5' and 3' arms were mapped by 45,845 and 447 reads, respectively. This observation is corroborated by the complete conservation of the 5' arm across all three sponges, while there were several mismatches in the 3' arm. (Fig 2). This suggests that miR-2015-5p has similar biological function across all three sponges, while miR-2015-3p is likely to be only functional in A. queenslandica, but not in the other two sponges. For all of the other conserved miRNAs in S. carteri and X. testudinaria, the relative abundances of the miRNAs derived from both arms of the hairpin loop were similar to those observed in A. queenslandica.

Identification of Core RNAi Proteins
Previously, Grimson et al. [4] had identified the presence of a functional small RNA machinery in A. queenslandica. In this study, we identified candidate homologues of core RNAi proteins in the S. carteri and X. testudinaria proteomes, strongly indicating that small RNA regulation is universal even in the most basal metazoans. These homologues, when aligned against known RNAi proteins, revealed fairly strong conservation of key protein domains and residues crucial for protein function; also, maximum-likelihood phylogenetic trees of our candidate homologues with other known RNAi proteins provided additional support that the identified homologues are poriferan in nature. As expected, all of these candidate homologues clustered strongly with sequences from A. queenslandica or E. fluviatilis, producing branch scores that are close to 100.
miRNA Repertoires of S. carteri and X. testudinaria We predicted a combined set of 30 bona fide miRNAs for both sponges, including seven of the eight miRNAs previously identified in other demosponges [4,6,37]. Five of these conserved miRNAs were present in both S. carteri and X. testudinaria, while the other two is present in S. carteri. Consistent with previous observations by other groups [4,6,[37][38][39], all of our predicted miRNAs were specific to the class Demospongiae. Also, the novel miRNAs in S. carteri and X. testudinaria were unique to those species, indicating that miRNA genes continued to expand after the evolutionary divergence of both species. Based on the phylogenetic tree in Sperling et al. [37], as well as Northern blots in Wheeler et al. [6], there is good reason to believe that all eight miRNAs originally identified in A. queenslandica (i.e. miR-2014 to miR-2021) should also be present in S. carteri and X. testudinaria, as S. carteri is a democlavid while X. testudinaria is a haplosclerid. Consistent with this expectation, with the exception of miR-2021 in S. carteri, we found matching reads in S. carteri and X. testudinaria to all eight aforementioned miRNAs (Table 3). However, as our miRNA prediction pipeline is dependent on the short reads in the library mapping in a hairpin-like manner against the draft genome, it is possible that the current limitations of the genomes we used testudinaria. The miRNA sequences from the 5' and 3' arms are presented on the left and right, respectively. The bolded name denotes the arm that had more reads mapped to it (i.e., the mature arm for that miRNA). Bases are coloured to enable visualisation of the conservation level, as follows: dark blue, > 80%; blue, > 60%; light blue, > 40%; uncoloured, 40%). The abbreviations are as described in Fig 1. doi:10.1371/journal.pone.0149080.g002 miRNAs in Sponges contributed to the inability to predict more conserved miRNAs from both of our short read datasets. This was more evident for X. testudinaria: the mature and star sequences of miR-2014 maps perfectly in close proximity to the same X. testudinaria scaffold, but unfortunately, these loci were separated by a stretch of Ns; miR-2017-3p and miR-2018-5p did not have any matches to the genome. Additional PCR verification with miRNA sequences as primers succeeded in amplifying miR-2014 (S8 Fig); on the other hand, miR-2017 and miR-2018 did not amplify, in line with their absence in the genome. For S. carteri, the mature miR-2014 identified in A. queenslandica (aqu-miR-2014-3p) corresponds perfectly to the star sequence of sca-miRtemp-2 -the mature sequence of sca-miR-temp-2 is, intriguingly, a homologue of aqu-miR-2016-5p (S1 Table).
Generally speaking, our data is in line with the consensus that the larger eumetazoan miRNA repertoire-relative to sponges-contributes to the increased morphological complexity seen in eumetazoans [6,[39][40][41]. However, the absence of miR-2021 in S. carteri and the presence of new miRNAs in both S. carteri and X. testudinaria can be construed as the evidence that miRNAs have been gained and lost in sponges despite their relatively simple body structures and lifestyles.
In conclusion, on top of demonstrating the strong conservation of a subset of miRNAs across demosponges, we have also expanded the known set of high confidence demosponge miRNAs into the double digits. We provide further indication that miRNA expansions and losses continued even after the evolutionary divergence of the orders within demosponges.

Data Accessibility
The sequencing datasets for S. carteri and X. testudinaria are available under NCBI BioProject IDs of PRJNA254402 and PRJNA254412, respectively. Supporting Information S2 File. Sequences (in FASTA) that were used to construct the maximum-likelihood phylogenetic trees for core RNAi proteins. (FA) S3 File. Sequences (in FASTA) that were used to construct the maximum-likelihood phylogenetic trees for Serrate and GW182.
(FA) S1 Table. Additional criteria used to filter for bona fide miRNAs in S. carteri from default miRDeep2 output. (XLSX) S2 Table. Additional criteria used to filter for bona fide miRNAs in X. testudinaria from default miRDeep2 output. (XLSX)