This Reprint May Differ from the Original in Pagination and Typographic Detail. Selective Microrna-offset Rna Expression in Human Embryonic Stem Cells Selective Microrna-offset Rna Expression in Human Embryonic Stem Cells

All material supplied via JYX is protected by copyright and other intellectual property rights, and duplication or sale of all or part of any of the repository collections is not permitted, except that material may be duplicated by you for your research use or educational purposes in electronic or print form. You must obtain permission for any other use. Electronic or print copies may not be offered, whether for sale or otherwise to anyone who is not an authorised user. ☯ These authors contributed equally to this work. Abstract Small RNA molecules, including microRNAs (miRNAs), play critical roles in regulating pluri-potency, proliferation and differentiation of embryonic stem cells. miRNA-offset RNAs (moRNAs) are similar in length to miRNAs, align to miRNA precursor (pre-miRNA) loci and are therefore believed to derive from processing of the pre-miRNA hairpin sequence. Recent next generation sequencing (NGS) studies have reported the presence of moRNAs in human neurons and cancer cells and in several tissues in mouse, including pluripotent stem cells. In order to gain additional knowledge about human moRNAs and their putative development related expression, we applied NGS of small RNAs in human embryonic stem cells (hESCs) and fibroblasts. We found that certain moRNA isoforms are notably expressed in hESCs from loci coding for stem cell-selective or cancer-related miRNA clusters. In contrast , we observed only sparse moRNAs in fibroblasts. Consistent with earlier findings, most of the observed moRNAs derived from conserved loci and their expression did not appear to correlate with the expression of the adjacent miRNAs. We provide here the first report of moRNAs in hESCs, and their expression profile in comparison to fibroblasts. Moreover, we expand the repertoire of hESC miRNAs. These findings provide an expansion on the known repertoire of small non-coding RNA contents in hESCs.


Introduction
Human embryonic stem cells (hESC) are pluripotent cells derived from the inner cell mass of blastocyst stage embryos, which can be indefinitely maintained in culture [1][2][3]. The pluripotency, proliferation, and differentiation of hESCs are influenced by transcription factors that mediate their actions in concert with miRNAs, small endogenous RNAs processed by RNAse III endonucleases Dicer and Drosha [4][5][6][7][8]. With the ability of a single miRNA to regulate In each of the three samples, the majority of reads aligned to miRNA hairpin precursor sequences. This fraction dominated in HFF-1 in comparison with hESCs.
To extend small RNA characterization to additional hESC lines, we utilized data published by two small RNA deep sequencing projects: Morin et al. 2008 [35] (PRJNA79477) and Bar et al., 2008 [36] (GSE21722). In total, we analyzed four hESC libraries: HS401, HS181, H9 [35] and H1 [36], and three libraries of differentiated cells: HFF-1, embryoid bodies [35] and spontaneously differentiated hESCs [36]. To illustrate the read distribution through bioinformatics workflow in all libraries, the approximate read counts from the raw reads to miRNA sequence extraction are shown in Table 1. We observed relatively high abundance of miRNA-offset RNA (moRNA) loci mapping reads in the hESC libraries compared to cells with differentiated phenotype.

Expression of known miRNAs
To characterize dominating miRNAs in hESCs, miRNA expression was profiled from the three libraries prepared by us: HS401, HS181 and HFF-1. miRNA differential expression analysis between hESC and HFF-1 libraries resulted in 344 miRNAs significantly differentially expressed (S2 Table). Of these, 271 miRNAs were overexpressed in hESCs and 73 were overexpressed in HFF-1. The fifteen most significant miRNAs derived from eleven different hairpin precursors,  Table 1. Read counts and percentages compared to raw data in total of seven small RNA-seq libraries during bioinformatics workflow from raw reads to miRNA and moRNA sequences. all overexpressed in the hESC libraries, are shown in Table 2. Ten of these miRNAs belong to known hESC-specific miR-302/367 and miR-371/372/373 clusters. Known hESC miRNA cluster miR-106a-363, its paralog miR-17-92 and a large C19MC cluster located at close vicinity of miR-371/372/373 were also found to produce significantly overexpressed miRNAs [4][5][6][13][14][15][16][17]35] (S2 Table). Differential expression of three overexpressed and two underexpressed miR-NAs were verified by quantitative Real-Time PCR (qRT-PCR) (S1 Fig.).
In addition, we found eight novel 5p or 3p miRNAs from known hairpin precursors which previously contained the reference sequence for only one mature miRNA in miRBase ( Table 3). Three of the novel miRNAs were detected in both hESCs and HFF-1, while five were expressed specifically in hESCs. The average normalized read count of the most abundant iso-miR of the hESC-specific novel miRNAs ranged from 2.1 to 136.9 reads per million mapped reads (RPM), and their expression appeared to be consistent in the hESC lines. Moreover, from the set of reads not annotated to any RNA species, we found seven novel miRNA hairpins, two of which are conserved in mammals (Table 4). For all of the novel hairpins, mature miRNA expression was observed only from either 5p or 3p stem, and three of them could not be detected in fibroblasts.

miRNA-offset RNAs expression profile in hESCs
The identification of abundant offset sequences mapping at the hESC miRNA loci prompted us to perform differential expression analysis following the calculation of candidate moRNAs. In total, 350 moRNAs were identified in our data: 220 5' moRNAs (moR-5p) and 130 3' moR-NAs (moR-3p) expressed with at least one read from the vicinity of 273 miRNA hairpins ( Table 5, S3 Table). 326 moRNAs were found in hESCs, while only 65 were found in HFF-1 cells. The length of detected moRNA reads was between 15-36 nt (median = 20 nt), and similar to miRNAs, moRNAs were expressed with overlapping, variable length reads, called isomoRs ( Fig. 2 A,B). We measured the expression of each moRNA by counting the number of reads that aligned to the locus of its most abundant isomoR sequence allowing at most two mismatches and extension by two nucleotides both upstream and downstream. Of all the reads deriving from miRNA haipin loci, the proportion of moRNA reads was roughly five times larger in hESCs than in HFF-1 whereas the percentage of mature miRNA reads in hESCs was only about half of their fraction in HFF-1 (Table 1, S1 Table). Also the expression level of distinct moRNAs was higher in hESCs (median 0.5, average 3.7 RPM) than in HFF-1 (median 0.2, average 1.5 RPM). In hESCs, 58% and in HFF-1, 96% of the moRNA reads derived from the 5' arm of the hairpin which suggests a bias towards 3' moRNA expression in hESCs. Considering the total of 350 moRNAs found, 229 (65%) were derived from a conserved miRNA hairpin loci (conserved in mammals, mean PhastCons value > 0.5). The extended hairpins (moR-5p+hair-pin+moR-3p) were also conserved in 223 cases (64%). Consistent with earlier studies, we did not detect any clear correlation between the expression levels of mature miRNAs and moRNAs derived from the same miRNA hairpin arm in neither hESCs nor HFF-1 ( Fig. 3 A,B). In addition, seven of the 12 most significant 5' moRNAs shown in Table 5 are expressed from a miRNA hairpin opposite to the major miRNA stem (3p) and one of them, moR-421-5p, is expressed alone without expression of its adjacent miRNA miR-421. This phenomenon where the 5' moRNA is expressed from the same arm with the minor miRNA has been recently observed also by Gaffo and co-workers [37]. On the other hand, all the three most significant 3' moRNAs derive from precursors where also the major miRNA is expressed from the 3p arm. Also in line with previous studies, we observed that 5p moRNA reads exhibit a length distribution at 16-26 nt with a peak at 19 nt, whereas 3p moRNA reads were distributed more randomly between 16 and 29 nt (Fig. 3 C,D). However, the scattered distribution of 3' moRNAs can partly be explained by the low number of unique moR-3p sequences. There were 26 moR-NAs, derived from 24 miRNA hairpin loci, whose most abundant isomoR was detected with more than 5 RPM and 11 of them were expressed with more than 10 RPM (Table 5, S3 Table). On the other hand, up to 159 (45%) of the moRNA sequences were found only once in our data. The identified moRNAs were further searched from H9 data by Morin et al. 2008 [35] and from H1 data by Bar et al. 2008 [36] (S3 Table) from which we found 106 and 21 moRNAs in common with our data, respectively. While Morin et al. data contained moRNAs from both arms of 11 miRNA hairpins, we detected only 5' sequences from Bar et   ).)))))))))))...... )))))))))))))).)))......))))).... ).))...))))))))))))))))))))).. ).)))..))).....)))))).. ))))))))..))))...))))))))... )...)))))))))))))).))).
We found 57 moRNAs to be significantly differentially expressed between the hESC and HFF-1 libraries ( Table 6, S4 Table), and all of them were overexpressed in hESCs. Among them are seven of the ten moRNAs that derive from the pluripotency related miR-302/367 cluster. The other known hESC miRNA clusters detected highly represented in our study gave rise to several overexpressed moRNAs as well: moRNAs derived from both ends of mir-363 and miR-20b hairpins, moR-92a-2-5p from miR-106a-363 cluster and four moRNAs from the paralog miR-17-92 cluster. Also C19MC cluster was represented by 20 moRNAs, which all derived from miR-515 family of hairpins. Moreover, many of the differentially expressed moR-NAs found in our data sets were also present in H9 and H1 data. In conclusion, moRNAs appeared to arise from highly expressed and hESC-selective miRNA clusters with a couple of exceptions such as miR-371/372/373 cluster with high miRNA expression in hESCs but only few detected moRNAs. On the other hand, only few reads of moR-103a-2-3p ( Fig. 2) in HFF-1 made it overexpressed in hESCs, while its related miRNA miR-103a-3p, was highly expressed in all libraries.

Probing for the functionality of moRNAs
On average, 97% of the isomoR reads related to each 3'-moRNA had a consistent 5' end, which may refer to importance of the 5' part of the sequence in target recognition (for miRNAs, this fraction was *80% for both stems in our data). This observation led us to study the possibility of 'miRNA-like' function for moRNAs using moR-103a-2-3p which is one of the most abundantly expressed moRNAs in hESCs and derives from the locus of mir-103a-2, a conserved regulator of cancer metastasis, cell proliferation, insulin response control and adipogenesis [38][39][40][41]. Hence the function of mir-103a is well-known and, in addition, its hairpin structure was found to be associated with moRNAs in HS401, HS181 and H9 (Morin et al. data) and also in previously published human moRNA reports [23,26].
First, we ensured the hESC-related expression of moR-103a-2-3p using quantitative Real-Time PCR (qRT-PCR). The expression of all four small RNA derivatives from pre-miR-103a-2 hairpin precursor was measured in four different hESC lines including two independent hESC lines H9 and FES29, and in two human induced pluripotent stem cell (hiPSC) lines HEL23 and HEL41 in early passages. In addition, HEK293 cell line was analyzed as a reference (Fig. 2 C).
Overexpression of moR-103a-2-3p, but not overexpression of any other derivatives, was detected in all analyzed lines when compared with HFF-1. The results are in line with the moRNA differential expression analysis made from NGS data (Table 6).
Next, we studied the effect of transfection of HFF-1 cells with moR-103a-2-3p using transfection with miR-103a-3p as a positive control. The whole transcriptome microarray analysis for the Shown are moRNAs supported with min 10 reads in at least one of the libraries, no mismatches were allowed.
doi:10.1371/journal.pone.0116668.t005  Table). The transfection with the positive control, miR-103a-3p, caused regulation of about 700 genes at least 2-fold (216 up and 497 down, p: 0.05). A notable fraction (321 of 538/497) of the down-regulated genes was common to both moR-103a-2-3p and miR-103a-3p transfections. Thus moRNA transfection affected the mRNA expression levels in HFF-1 cells, and the effect seemed to be partly similar with the effect of miRNA transfection. To elucidate if moRNA has miRNA supporting function via modulation of its expression, we ran qPCR to quantitate mature miR-103a-3p after transfection of moR-103a-2-3p to HFF-1. However, the miR-103a-3p expression level remained unchanged (data not shown), indicating that moRNA-mediated modulation of miRNA expression does not explain the common down-regulated genes.
The large amount of genes down-regulated by both mimics could also be because moR-103a-2-3p acts directly as a post-transcriptional regulator and shares at least part of its targets with miR-103a-3p. Because moR-103a-2-3p sequence did not have any perfect matches to other places in the genome, and because 98% of its isomoRNAs shared the same 5'-part, we deduced that moR-103a-2-3p might work similarly with miRNAs and down-regulate genes with minimum of 7-mer seed match in their 3' UTRs. To test this possibility, we considered a gene with a perfect 7-mer seed (nucleotides 2-8 from the 5' end) match of miR-103-3p or moR-103a-2-3p in its 3' UTR as a putative target of that small RNA and predicted the targets for these two in the three distinct gene sets: genes down-regulated by both mimics, genes down-regulated by only miR-103a-3p and genes down-regulated by moR-103a-2-3p (Table 7, S7 Table). To be conservative, we conducted the target analysis only for those genes that had an annotated 3'UTR in GRCh37/hg19. The number of putative miR-103a-3p targets was largest in the gene group that was down-regulated by miR-103a-3p but not by moR-103a-2-3p (50 genes; 30% of all genes in this group), while in that group the number of moR-103a-2-3p 'seed' matches was smallest (9; 5%). On the other hand, the number of genes with moR-103a-2-3p 'seed' matches was largest (43; 27%) in the group of genes which was down-regulated only by moR-103a-2-3p but not by miR-103a-2-3p, while the number of putative miR targets in that group was smaller (30; 19%) than the number of moR 'targets'. Hence, there seemed to be a connection between the existence of a moR 'seed' match in the gene 3' UTR and its down-regulation by moRNA. In all groups, the number of genes that had matches for both miR and moR was rather low, being about 4%.
Next, we inspected the number of possible miR-103a-3p and moR-103a-2-3p binding sites in the 3' UTRs of the most significantly down-regulated genes in both of the transfection studies (Table 8 and Table 9). Eight of the ten genes most significantly down-regulated by miR-103a-3p had at least one seed match for miR-103a-3p in their 3' UTRs while one of them, C7orf55, contained a 7-mer seed match also for moR-103a-2-3p (Table 8). Because C7orf55 was down-regulated by the moR mimic where its fold change (FC = -3.5) was about the same as with miR-103a-3p (FC = -3.6), it could also be a target of moR-103a-2-3p. While the existence of miRNA seed matches in this gene group was expected, the absence of moR seed matches was also noteworthy. Of the top ten genes down-regulated by moR-103a-2-3p, five contained one or two moR seed matches in their 3' UTRs but only four of them did not contain any miR seed matches (Table 9, Fig. 4 A). Because five of the genes significantly down-regulated by moR-103a-2-3p did not contain any perfect seed matches for it, we predicted its minimum free energy (mfe) binding sites in their 3'UTRs with RNAhybrid (Fig. 4) [42]. Typically, reads are shown. b) The minimal free energy structure of the extended hairpin sequence of hsa-mir-103a-2 (mfe = -37.50 kcal/mol). c) qRT-PCR of pre-miR-103a-2 hairpin derived small RNAs. Bars indicate logarithmic fold change relative to HFF-1 fibroblasts with Standard Deviation (SD; number of replicates n = 3 for HS401, HS181, FES29, HEK293; n = 2 for iPSC p5, iPSC p8).
Further, in order to gain global view of the possible function of moR-103a-2-3p, we performed Gene Ontology (GO) analysis with DAVID Functional Annotation Clustering-tool [43] for genes that were down-regulated only with moR-103a-2-3p mimic, only with miR-103a-3p mimic, or with both mimics (S6 Table). The ten most significantly enriched GO terms in the gene set down-regulated only by moR-103a-2-3p were related to ribosome, translation or mitochondria, and were the same as the top ten GO terms enriched among the genes that were down-regulated by both mimics (Table 10). In the strongly enriched category "ribosome", most of the genes encoded either ribosomal pseudogenes or mitochondrial ribosomes (36 of 41 genes in gene set down-regulated by both mimics). The miR-103a-3p down-regulated genes were also enriched in nine top categories shown in Table 10. In addition, enriched only in this gene set were terms "nucleosome" (p = 2.5E-4), "protein-DNA complex" (p = 1.1E-3), "chromatin assembly" (p = 7.7E-4), "phosphatidylethanolamine (lipid) binding" (p < 3.8E-4) and "fatty acid catabolic process" (p = 0.036). The lipid and fatty acid-associated categories reflect to the known functions of miR-103a-3p, while the chromatin-associated down-regulated categories may indicate still unknown functions for it. In all, the microarray analysis showed that miR-103a-3p has functions that are not connected to the function of moR-103a-2-3p. On the other hand, it did not show any specific functions for moR-103a-2-3p, but instead suggested that it might function in concert with miR-103a-3p.

Discussion
Canonical miRNAs form only a fraction of the small RNAs in many cell types. For example, sperm and oocytes contain abundant piwi-interacting RNAs (piRNAs) to protect genome integrity from the activity of retrotransposons [44], mRNA targeting endogenous siRNAs [45] derived for example from pseudogene-gene-transcript pairs, and the highly abundant tRNAderived small RNA fragments (tRFs) [21,[46][47][48]. For unknown reason, miRNA function is suppressed in mouse oocytes and early embryos [49,50] while the siRNA/piRNA fraction predominates, whereas during differentiation, the miRNA fraction increases, and siRNAs and piR-NAs are few, or even lost [51]. Interestingly, pluripotent embryonic stem cells have been shown to express miRNAs, endo-siRNAs and a small fraction of piRNAs [13,14,21,51]. Also, up to *9% of the short reads in our hESC data matched tRNA sequences (Fig. 1) which suggests that also tRFs are highly abundant in hESCs. However, the diversity and significance of small RNA species other than miRNAs remain largely unknown. We sequenced small RNAs from in-house-derived hESC lines HS401 and HS181 and human foreskin fibroblast line HFF-1 to profile miRNAome and characterize for the first time the recently discovered miRNA relatives, microRNA-offset RNAs, with unknown function from these cells. We discovered 350 unique microRNA-offset-like enrichments from the vicinity of 273 miRNA hairpins, which we refer to as moRNAs [21][22][23][24][25][26][27]37]. Several common characteristics with previously published reports emerged from our analysis. First, the identified moRNAs map precisely to the human genome and were located adjacent to the mature miRNA reads, in only a few cases overlapping the mature miRNA with one or two nucleotides. Second, like miR-NAs, moRNAs were characterized by overlapping reads referred to as isomoRs. Third, the iso-moR reads derived from the same 5' arm of the miRNA hairpin were similar in their end, and isomoRs derived from 3' arm were similar in their start. Fourth, the expression level of moR-NAs was in most cases lower than the expression level of corresponding mature miRNA with few exceptions. Fifth, moRNAs derived most often from the vicinity of conserved miRNAs. Sixth, moRNA expression level did not correlate significantly with the expression of adjacent miRNA and abundant expression of an miRNA was not always accompanied with high expression of the adjacent moRNA, suggesting that the expression of these two related molecules is not necessarily interdependent. Further, in many hairpins, the prevalent moRNA was expressed from the arm of the minor miRNA as observed also in Gaffo et al., 2014 [37].
Interestingly, hESC data yielded the majority (326) of unique moRNAs while only 65 emerged from HFF-1 data. The difference is even higher in relation to the total amount of miRNA hairpin mapping reads which were notably fewer in hESCs than in HFF-1 sample. Out  [26], we detected 58 and 37 overlapping sequences, respectively. Therefore, we report here the largest collection of unique moRNA reads derived from a human small RNAseq experiment so far. In contrast to the almost exclusive detection of 5p moRNAs reported in both of the previous studies we found also abundant 3p forms in the hESC data. Duplex forming small RNA species such as miRNAs and tRFs are typically expressed in an asymmetric manner by favouring accumulation of either one strand derivative but not both, which indicates their non-random processing [52,53], and according to this and previous studies, moR-NAs do not form an exception to this rule. Because of the detection of 3p moRNAs in the hESC data, we were able to compare the 5p/ 3p moRNA lengths and their possibility to form a miRNA-like *20 nt duplex which appears at the miRNA maturation stage before their unfolding by effector complexes to target mRNA [54]. We observed a large variation in 3p moRNA lengths ranging from 16 to 29 nts when compared with relatively homogenous distribution of 5p moRNAs with a peak in 19 nt. Interestingly, similar length distribution was reported to tRFs and other terminal small RNA fragments which are processed from longer RNAs [47]. If opposing moRNAs form a short duplex, they would harbor various lengths of overhangs, distinct from 2 nt 3' overhang processed by RNAse III enzymes Dicer or Drosha/DGCR8 [7,8]. The observation may indicate that, while the hairpin loop-side end of the moRNA may be determined by the Drosha/DGCR8 [22,23] the variable end probably results from alternative mechanisms. Recent reports have indeed shown an expression of a range of miRNA hairpin precursor variants with differing lengths and three dimensional structures which modulate the efficiency of miRNA processing [55][56][57]. Also relatively long, free 5' overhangs [56] have been identified in some hairpins which may explain the accumulation of 5p moRNAs. However, 3' overhangs longer than few nucleotides have not been reported so far. Therefore, molecular characterization of miRNA precursors, their binding enzymes and the resulting small RNA sequences will be essential for the elucidation of moRNA synthesis.
Most miRNAs deriving from the well-known pluripotency related miR-302/367, miR-371/ 372/373 and C19MC clusters were significantly overexpressed in our hESC data, a finding that is well in line with the earlier observations. Seven of the significantly differentially expressed moRNAs were derived from the hESC-specific miR-302/367 cluster. None of these sequences are found in other human moRNA studies [23,25,26], thus suggesting that in addition to miRNA expression, also the moRNA expression from this cluster may be specific to hESCs. Similarly, several moRNAs from the C19MC cluster are significantly overexpressed in the hESCs, but were not detected in earlier studies.
Before this study, human moRNAs have been reported only from neurons and cancer cells [23,25,26]. Also in our hESC data, many abundant moRNAs were derived from the miRNA clusters expressed in cancer. For example, moRNAs deriving from the c-myc induced miR-17-92 cluster [57] and the oncogenic miR-374b-421 cluster [58] have been detected also previously in the study by Bortoluzzi et al. 2012 [26]. On the other hand, these moRNAs were not present in our HFF-1 library. The moR-21-5p of the oncomir miR-21 was detected both in the HFF-1 library and by Bortoluzzi et al. 2012, while its opposing moR-21-3p expression was found only in our hESC data. Similarly, metastasis-associated mir-103a-2 cluster derived most abundant moRNA (moR-103a-2-5p) in Bortoluzzi et al. while the opposing moR-103a-2-3p was detected as second most abundant moRNA in hESCs. miR-103a-2 is not expressed by a consecutive hairpin cluster but as a bidirectional transcript from opposite DNA strands in chromosome 5, and it has also a bidirectional hairpin homolog (miR-103a-1) in chromosome 20. The homologous gene producing mir-103a-1 hairpin yielded also moRNA-like sequences, but in lesser extent both in our study and in the data of Bortoluzzi et al. 2012 [26]. miR-103a is shown to promote cancer-like properties by down-regulating KLF-4 and DAPK [38], but has also an interesting role in the modulation of miRNA processing and function via down-regulation of Dicer [39] and miRNA binding Argonaute AGO1 [59]. Interestingly, many moRNA-deriving, cancer-associated hairpins are also expressed in oocytes such as mir-17-92 cluster, miR-20, miR-21, miR-15a/16 and miR-103 [50] whereas miR-421 from mir-374b-421 cluster has been reported to be up-regulated in ovarian teratomas [60]. Taken together, moRNA expression seems to associate not only with certain cell types, but also with specific processes such as cancer and metastasis.
So far, no results concerning the possible functionality of moRNAs have been reported. In this study, we took up this challenge by transfecting HFF-1 cells with moR-103a-2-3p mimic and measured the effect with whole transcriptome microarrays. The set of genes down-regulated by moR-103a-2-3p was about as large as the set of genes down-regulated with the control, miR-103a-3p, suggesting a role for this moRNA in regulation of gene expression. However, it is unclear how moR-103a-2-3p affects gene expression; one possible way could be its direct binding to the target mRNAs 3' UTR by the similar manner to canonical miRNAs. Consistent 5' end of the sequence, a typical cleavage by RNA endonucleases such as Dicer and Drosha, suggests that it could have a role in target recognition in a 5' seed based manner. Also, the observation, that moR-103a-2-3p 5' seed matches were often found from 3' UTRs of genes downregulated by moR transfection but which instead were absent from the 3' UTRs of genes that were down-regulated by miR transfection, may support the hypothesis. Even so, the approach taken in this study supports only the 3' moRNAs, and dissection of functions of 5' moRNAs with variable 5' ends will require development of alternative approaches. While it is unsure if the observed changes in gene expression levels were caused by direct action of moR-103a-2-3p, also the mechanism how it induced the changes needs further investigation.
Notable fraction, 60%, of the genes down-regulated by the moR-mimic were down-regulated also by miR-mimic and were related to ribosomal or mitochondrial functions; these were also the most significant GO terms among the genes down-regulated only by moR-103a-2-3p. Instead, the genes down-regulated only by miR-103a-3p were associated also with other functions, part of which are formerly known. As a conclusion, it seems that moR-103a-2-3p would not have a function of its own, but could act as a co-player to miR-103a-3p and enhance its function in hESCs. We observed that the sequences of moR-103a-2-3p and miR-103-3p are partly similar which could explain some of the common regulated genes (S2 Fig.). The similarity is more prevalent at the latter half of the sequences (continuous at nucleotides [14][15][16][17][18] which excludes the possibility of canonical targeting of the same mRNA by the seed area. However, it could indicate either a novel recognition manner of the target mRNA, effector Argonaute competition situation, or attraction of Argonautes by similar sequence motifs. It is also possible that the absence of enriched GO terms uniquely related to moR-103a-2-3p is because the moRNA mimic is not expressed in the right cellular context or with suitable chemical modifications, as it is designed to imitate mature miRNA. Either, we cannot rule out the possibility that moR-103a-2-3p is a nonfunctional product from the miR-103a synthesis. Small terminal RNAs have been recently reported emerging from most classes of longer RNAs such as tRNA (tRFs), rRNA, snoRNA and snRNA but not mRNA. Interestingly, moR-NAs arise similarly as end fragments from pre-miRNAs and even exhibit similar length distributions to tRFs, 5' sequences being about 19nt and 3' sequences distributed broadly between 16nt-30nt, which may indicate a processing by common enzymatic machineries. Some of tRFs and snoRNA-derived small RNAs have been shown to function as miRNAs but also the effectiveness of miRNA loading to Argonautes has been shown to be modulated by some tRFs. It will be of interest to elucidate if moRNAs can take part to the Argonaute associationmodulation.

Cell culture and RNA isolation
For Small RNA-seq, hESC lines HS401 and HS181 were cultured on irradiated HFF-1 feeder layer in KnockOut Dulbecco's Modified Eagle Medium (DMEM) supplemented with Knockout Serum Replacement (Life Technologies Ltd, UK) and 8ng/ml basic fibroblast growth factor (FGF-2; R&D Systems, Minneapolis, MN, US), and passaged enzymatically in the presence of ROCK inhibitor Y-27632 (Calbiochem, Merck KGaA, Darmstadt, Germany) [63]. For small RNA-seq, the HFF-1 line was cultured in cell culture plates without coating substrates in Iscove's Modified Dulbecco's Medium (IMDM) supplemented with 10% fetal bovine serum (Life Technologies). For qRT-PCR, hESC lines HS401 and HS181 were cultured in similar conditions as for miRNA-seq, and hESC lines H9 and FES29 in feeder-free conditions on Matrigel and STEMPRO hESC SFM (Life Technologies) supplemented with 8ng/ml of FGF-2 (Life Technologies).
Cells were harvested using TryplE (Life Technologies) and total RNA was extracted using TRIzol (Life Technologies) according to the manufacturer's protocol. The quality of total RNA was analyzed by the Nano 6000 (chip) Kit for Bioanalyzer (Agilent Technologies, Santa Clara, CA) and only samples with an RNA Integrity Number (RIN) greater than 9.0 were used for preparation of small RNA libraries.

Small RNA-seq library preparation
Small RNA libraries were prepared from lines HS401, HS181 and HFF-1 using Illumina Small RNA Sample Prep-kit v 1.0 (Illumina Inc, San Diego, CA) as described earlier [64]. Ten μg of total RNA from each cell line was size fractionated using a 15% Novex gel (Life Technologies) and fractions corresponding to 15-40 nucleotides (nt) were excised for further preparation. The purified small RNA fraction was ligated into 5'-and 3' end adapters, and the final product was reverse transcribed, PCR amplified 15 cycles, and sequenced with Genome analyzer IIX (Illumina). The raw data files are available at the Gene Expression Omnibus (GSE62501).

Small RNA data analysis
After initial pre-processing by standard modules used with the Illumina Genome Analyzer IIX which include Firecrest for image analysis, Bustard for base calling and GERALD for the first genome alignment, the reads were preprocessed by removing bad quality reads (mean base quality Q<20) and low quality ends. Subsequently, 3' adapters were trimmed using in-house tools, and adaptor dimers, homopolymers and too short reads (<14nt) were discarded. The trimmed reads were aligned to human genome hg19 using Bowtie 0.12.7 [65], allowing at most two mismatches and not more than 10 possible origins in the genome. Reads that aligned to the area of known human miRNA hairpins (miRBase v 17, April 2011, http://www.mirbase. org/) [66] were taken apart to create their own data set, and the number of reads in this set that aligned to annotated mature miRNA loci extended with two nucleotides both upstream and downstream was counted. Mature miRNAs which were detected with expression of at least 1 read per million mapped reads were included into the differential expression analysis between the hESC lines (HS401 and HS181) and HFF-1, which was performed using R/Bioconductor package DESeq [67]. DEseq makes the assumption of a negative binomial distribution and a locally linear relationship between over-dispersion and mean expression levels of the data. A miRNA was considered to be differentially expressed between hESC and HFF-1 libraries if the p-value was less than 0.0005. Because we did not have replicates of HFF-1 data, DESeq estimated the dispersion based only on the ES replicates. There are differences in gene expression between independently-derived hESC lines [68,69] and it has been shown that several pluripotency-related transcription factors are heterogeneously expressed in mouse ES cell lines [69][70][71][72]. One of those fluctuating TFs is NANOG [73], which is also involved in activation of ES cell miRNAs miR-290 and miR-302 [74]. Hence, we had a reason to believe that the variation of miRNA expression in hESCs is likely larger than the variation in HFF-1.
The reads that did not align to known miRNA hairpins were further searched for perfect matches with other small RNA sequences: snRNAs, snoRNAs, rRNAs, mitochondrial tRNAs, misc RNAs downloaded from ensembl (http://www.ensembl.org), piRNAs from piRNAbank [75] and Genbank (http://www.ncbi.nih.gov/genbank), tRNAs from UCSC (http://genome. ucsc.edu), and human genome repeats from Repbase September 2011 release [76]. The reads that could be aligned to distinct small RNA classes were filtered out from the data; the read count information during the filtering steps is shown in S1 Table. miRNA hairpin precursors associated to only one known mature miRNA product were searched for new minor 5p or 3p miRNA forms requiring that a candidate sequence has at least 10 occurrences, maps with no mismatches to the arm opposite of the known mature miRNA, and has strong base pairing (14 bp within the first 20 nt) with the known mature miRNA. Further, in order to find novel miRNAs, the reads not mapped to any small RNA species were studied with miRDeep2 [77]. Further criteria used to consider a predicted miRNA hairpin as a miRNA candidate were: (1) at least 10 exactly mapping reads for the main miRNA product of the hairpin, (2) at most 10 genomic copies, and (3) GC content in known miRNA range (15-90%). We divided the novel miRNA hairpins into conserved and nonconserved ones using the PhastCons data [78] available in UCSC (http://genome.ucsc.edu). PhastCons values show the probability that a nucleotide belongs to a conserved element, and here the limit for a hairpin to be conserved was set to mean PhastCons value 0.5 (conserved in mammals).
Candidate moRNAs were searched from the vicinity of known miRNA precursors. First, reads that mapped to the area of miRNA hairpin sequences (miRBase release 17) extended with 30 nt both upstream and downstream were gathered, and reads that mapped to the miRNA hairpin sequence area were excluded from this set. We considered those reads located at 5' side of miRNA hairpin as putative 5' moRNA sequences. Similarly, reads that mapped to 3' side area of the hairpin were considered as putative 3' moRNA sequences. Only reads mapping to the same strand with the miRNA precursor were counted as possible moRNAs. The differentially expressed moRNAs between ES and fibroblast libraries were searched the same way as the differentially expressed miRNAs. Due to large variability in unique moRNAs with low read counts and detection of common moRNAs with high read counts in hESC samples, shown also by moRNA relative expression analysis, only sequences with at least 5 reads per sample and 0.5 reads per million (RPM) were taken to size distribution analysis.

Microarray and Gene Ontology analysis
Human HT12 V4 whole transcript arrays (Illumina) were hybridized in Biomedicum Helsinki Functional Genomics Unit (FUGU). Microarray data analysis was performed using Gene-Spring, http://www.genomics.agilent.com. Correlation coefficients and Principal Component Analysis were used to asses sample quality. Poor quality signals were removed by filtering by percentile (lower cut-off 10% and upper cut-off 98%). Benjamini Hochberg False Discovery Rate corrected t-test was applied to identify differentially expressed genes between (two) conditions. DAVID Bioinformatics database [43] was used for functional annotation clustering and Gene Ontology analysis, http://david.abcc.ncifcrf.gov/home.jsp. Modified t-test, referred as EASE score, was applied to calculate p-value of enriched Gene Ontology categories.
Supporting Information S1