The Role of Transposable Elements in the Origin and Evolution of MicroRNAs in Human

MicroRNAs (miRNAs) are crucial regulators of gene expression at the post-transcriptional level in eukaryotes via targeting gene 3'-untranslated regions. Transposable elements (TEs) are considered as natural origins of some miRNAs. However, what miRNAs are and how these miRNAs originate and evolve from TEs remain unclear. We identified 409 TE-derived miRNAs (386 overlapped with TEs and 23 un-overlapped with TEs) which are derived from TEs in human. This indicates that the TEs play important roles in origin of miRNAs in human. In addition, we found that the proportions of miRNAs derived from TEs (MDTEs) in human are more than other vertebrates especially non-mammal vertebrates. Furthermore, we classified MDTEs into three types and found that TE head or tail sequences along with adjacent genomic sequences contribute to generation of human miRNAs. Our current study will improve the understanding of origin and evolution of human miRNAs.


Introduction
Transposable elements (TEs) as important components of many genomes are able to mobilize and replicate in the host genomes [1]. There are two kinds of these elements: retrotransposons, and DNA transposons [2]. The retrotransposons can be further classified into three categories: long terminal repeat (LTR), long interspersed nuclear element (LINE) and short interspersed nuclear element (SINE). TEs were claimed to be an evolutionary force and were found to be related to epigenetic regulatory mechanisms [3][4][5]. In addition, TE sequences are able to provide TF binding sites during gene expression, and change regulatory networks of gene expression [6,7].
The discovery of small RNA led to an emerging of prospect for uncovering the functions of TEs. Various small RNAs have been discovered, such as microRNAs (miRNAs), short interfering RNAs, piwi interacting RNAs and so on [8][9][10]. MiRNA is the first discovered small RNA [10]. MiRNAs are a class of short non-coding RNAs (approximately 22 nt) that are cleaved from longer (approximately 70 to 90 nt) precursor miRNAs (pre-miRNAs) [11,12]. In animals, most miRNAs regulate gene expression by targeting mRNA-specific regions (known as miRNA-target sites) via partially complementary manner [13,14]. These target sites are mainly located in 3'-untranslated regions (3'-UTRs) and complement with miRNAs via sequences of approximately 7 nt at the 5' ends of miRNAs (known as 'seed' regions) [15,16]. MiRNAs regulate gene expression by degrading mRNAs or repressing mRNA translation via recognizing their target sites [17]. Although many algorithms were developed to predict the target sites of miRNAs, the mechanism of miRNA recognition of target sites is not fully understood [15,[18][19][20]. Understanding the origin of miRNAs and their target sites will improve the rationale for developing algorithms to optimizing the prediction of miRNA-target genes.
TEs were claimed to provide a natural mechanism for the origin of new miRNAs and the targets of some miRNAs [21][22][23][24][25]. For instance, mir-28, mir-95 and mir-151 are derived from LINE-2 TEs, and mir-548 family is derived from Made1 TEs [21][22][23]. Alu elements of TEs could be targeted by almost 30 human miRNAs [25]. The hsa-mir-566 was found to be derived from Alu and 80% of its predicted target sites were claimed to be derived from TEs and related to Alu element [23]. However, it remains largely unknown what and how miRNAs originated from TEs in human.
In the current study, we provide evidences to show that TEs are important sources for the origin of miRNAs in human. Our results uncover the evolution of miRNAs derived from TEs in human and provide an insight into the mechanism of the origin of miRNAs.

Materials and Methods
To identify the miRNAs derived from TEs (MDTEs), pre-miRNAs and their associated data were collected and analyzed by following steps: Firstly, 6845 pre-miRNAs with chromosomal locations of eight vertebrates (Danio rerio, Xenopus tropicalis, Gallus gallus, Bos taurus, Mus musculus, Macaca mulatta, Pan troglodytes and Homo sapiens) were obtained from miRBase v20 [26]. The pre-miRNAs and their adjacent upstream and downstream 4,000 bp sequences were downloaded using the BioMart tool from Ensembl genome database (Release 68) [27].
Secondly, the pre-miRNAs and their adjacent sequences were used as the query sequences to identify TEs. TEs which locate on the query sequences were identified using the RepeatMasker program based on the repeatmasker libraries 20140131 [28,29]. The Wu-blast program was used as the searching engine of RepeatMasker and the parameter-s was set to improve the accuracy of identification. The locations of TEs on query sequences were extracted from the ".out" file which is one of the output files of RepeatMasker.
Finally, the locations of TEs were compared with those of pre-miRNAs on query sequences. If a pre-miRNA overlapped with a TE on the query sequence, this miRNA was defined to be a MDTE. The proportions of overlap between pre-miRNAs and TE sequences were calculated for classification of MDTE types.
To identify a human MDTE which lose its sequence feature of TE and has homologous relations of miRNAs among human and seven other vertebrates were analyzed and human MDTEs without TE sequence feature were identified if a human miRNA showing non-overlapping with a TE but has homologies from several other vertebrates overlapping with same TE sequence.
To address whether different TE families have equal contributions to the origin of MDTEs, the proportions of MDTEs generated from different TE families were calculated and compared with the proportions of TE families in human genome. The proportions of MDTEs generated from different TE families were calculated following the procedure described above. The proportions of TE families in human genome were obtained from the published genome sequencing data [30]. Pearson's Chi-squared test was used to evaluate the significance of differences and P< 0.01 indicates that different TE families have different contributions to the origin of MDTEs significantly.

Identification of miRNAs derived from TEs in human and seven other vertebrates
The MDTEs were identified from human and seven other vertebrates (Danio rerio, Xenopus tropicalis, Gallus gallus, Bos taurus, Mus musculus, Macaca mulatta and Pan troglodytes). Surprisingly, non MDTEs were found in Xenopus tropicalis. Proportions of MDTEs in miRNAs increased with the evolution of vertebrates and the proportion in human was more than those in other analyzed vertebrates (Fig 1). Meanwhile, it was observed the proportions of MDTEs in miRNAs bear little relevance to the proportions of TEs in genomes. For example, although more than one-third of the genomes are made up by TEs in Danio rerio and Xenopus tropicalis [31,32], the proportions of MDTEs in miRNAs are less than 5%. In comparison, TE sequences constitute 9% of genome in Gallus gallus, but 6.98% of miRNAs were MDTEs [33]. The MDTEs account for 19.84% of miRNAs in Homo sapiens and TE sequences make up 44.83% of its genome [30]. This observation might be due to the significant differences between the components of TEs in Danio rerio and Xenopus tropicalis and those in human and other mammals. This argument was supported, at least in part, by the observation that the major TEs are DNA types in Danio rerio and Xenopus tropicalis compared to retrotransposable elements in mammals [30][31][32]34]. Given the contribution of TEs to miRNAs were negligible in Drosophila [35], MDTEs mainly present in genomes of human and other mammals. Information of MDTEs in human and seven other vertebrates was summarized and listed in Table 1.
When MDTEs were undergone homology analysis among Danio rerio, Gallus gallus and mammals, no homology of MDTEs among them were found. Fourteen MDTEs are conserved among five mammals and forty-seven MDTEs are conserved among primates. This finding implies that MDTEs are species-specific due to the difference of TEs among species.

Analysis of MDTEs in human
To further investigate the pattern of MDTEs in human, 1872 miRNA gene sequences of Homo sapiens collected from the miRBase v20 were mapped to the human genome and analyzed. In total, 386 MDTEs which completely or partly overlap with TEs show unique relationships to their related TEs. It can be demonstrated via observing the origin of multi-copy MDTEs or MDTE families. Each copy in multi-copy MDTEs or each member of a MDTE family was found to be originated from the same TE. For example, six copies of hsa-mir-3118 in human genome are all partly derived from LINE/L1PA13 and a large miRNA group, hsa-mir-548, is derived from DNA/MADE1 element. Taken together, our findings suggested that a MDTE and its homologies are derived from the same TE. S1 Table lists detailed information of all MDTEs.
When multi-copy MDTEs were excluded, 338 unique MDTEs (UMDTEs) were identified and can be classified into three types (Fig 2A): Type I UMDTEs derived from inverted TE sequences, Type II UMDTEs with sequences partly overlap with TE sequences that are not inverted, and Type III UMDTEs with sequences wholly overlap with TE sequences.
MiRNAs have been identified in various organisms with rapidly increasing number in databases. In humans, except for multi-copies, approximately 19.84% (338/1704) of miRNAs overlapping with TEs are regarded as UMDTEs. Inverted TEs have been claimed as an important configuration of miRNA origin [21,22]. Consistently, 11.24% of UMDTEs were found to be derived from inverted TEs, 36.98% of the UMDTEs were found to be derived from whole TEs  ( Fig 2B). It might be due to the abundance of similar fragments and palindromic structure in TEs [4,36]. These fragments provide the potential to form the hairpin structure of miRNAs.

Type II MDTEs are derived from TEs via two patterns in human
In UMDTEs, 51.78% miRNA belongs to Type II MDTEs which partly overlap with TEs. Type II MDTEs was found to be generated by two patterns: Pattern I in which MDTEs loss their TE sequence features from whole TE sequences, and Pattern II in which MDTEs with a part of the pre-miRNA are derived from the head or tail of TEs (Fig 4). In Pattern I, it is evident to observe MDTEs in miRNA homologies or multi-copy miRNAs obviously. The overlap between TEs and MDTEs in pattern I is reduced from 100% to 30% or even less (Table 2).
Compared with Pattern I MDTEs, which account for 77.14% of Type IIMDTEs, TEs form one arm of the pre-miRNAs in Pattern II MDTEs. In this condition, the TEs were inserted in the proximity of appropriate sequences that are similar to the complementary sequences of the TE head or tail to form the hairpin structure of pre-miRNAs, such as hsa-mir-326, hsa-mir-421 and hsa-mir-619 (S2 Table). For Pattern II MDTEs, the mature miRNAs are derived from not only the internal portion of TEs but also non-TE sequences that were complementary with the head or tail of TEs.
Two origin mechanisms of MDTEs can be found across three types of MDTEs. In Type II and Type III MDTEs, some miRNAs were generated via being taken into genomes by TEs and passed on to new species after species differentiation. In Type I and Type II MDTEs, some miRNAs are generated by TEs from current genome.

Identification and characterization of human MDTEs without TE sequence features
About 19.84% miRNAs wholly or partly overlap with TE sequences in human, but the origin of other miRNAs is not very clear. To identify those MDTEs losing their sequence features of  TEs, the miRNAs which do not overlap with TEs in human were analyzed and compared with their homologies in other vertebrates. Twenty-three miRNAs were identified as human MDTEs which do not overlap with TEs, while their homologies either wholly or partly overlap with same TE sequence in other species (S3 Table). Although these MDTEs just account for 1.35% of all miRNAs in human, it implies that more miRNAs than expected may be derived from TE sequences in vertebrates.

Conclusion
In summary, we found that TE is an important origin source of human miRNAs. MiRNAs can be brought into genomes during the insertion of TEs or generated by TE sequences via particular mechanisms in current genome. When MDTEs fixed in the genome, sequence features of TE of MDTEs might be lost during the evolution. The observation that some MDTEs partly overlap with TE sequences and some MDTEs do not overlap with TEs implies that there are more MDTEs in genomes of vertebrates than what we previously believed. Our findings provide an insight into the origin and evolution of miRNAs.
Supporting Information S1

Author Contributions
Conceived and designed the experiments: SQ FM. Analyzed the data: SQ PJ XZ. Wrote the paper: SQ LMC FM.