De novo characterization of microRNAs in oriental fruit moth Grapholita molesta and selection of reference genes for normalization of microRNA expression

MicroRNAs (miRNAs) are a group of endogenous non-coding small RNAs that have critical regulatory functions in almost all known biological processes at the post-transcriptional level in a variety of organisms. The oriental fruit moth Grapholita molesta is one of the most serious pests in orchards worldwide and threatens the production of Rosacea fruits. In this study, a de novo small RNA library constructed from mixed stages of G. molesta was sequenced through Illumina sequencing platform and a total of 536 mature miRNAs consisting of 291 conserved and 245 novel miRNAs were identified. Most of the conserved and novel miRNAs were detected with moderate abundance. The miRNAs in the same cluster normally showed correlated expressional profiles. A comparative analysis of the 79 conserved miRNA families within 31 arthropod species indicated that these miRNA families were more conserved among insects and within orders of closer phylogenetic relationships. The KEGG pathway analysis and network prediction of target genes indicated that the complex composed of miRNAs, clock genes and developmental regulation genes may play vital roles to regulate the developmental circadian rhythm of G. molesta. Furthermore, based on the sRNA library of G. molesta, suitable reference genes were selected and validated for study of miRNA transcriptional profile in G. molesta under two biotic and six abiotic experimental conditions. This study systematically documented the miRNA profile in G. molesta, which could lay a foundation for further understanding of the regulatory roles of miRNAs in the development and metabolism in this pest and might also suggest clues to the development of genetic-based techniques for agricultural pest control.


Introduction
MicroRNAs (miRNAs) are non-coding small RNAs and are usually 21-24 nucleotide (nt) in lengths [1,2]. Accumulative reports suggest that miRNAs function as important gene a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 coding RNA families (rRNA, tRNA, snRNA, snoRNA). Finally, the unique reads were screened out by eliminating the repeated sRNAs from the clean reads database with single sequence left.
To identify miRNAs, the unique sequence dataset with length of 18-26 bp were mapped in the miRBase (release 21, http://www.mirbase.org/) by BLAST search, and length variations at both 3' and 5' ends and one mismatch inside the sequence were allowed in the alignment. The unique reads that mapped to the known miRNAs and/or pre-miRNAs of specific species were determined to be conserved miRNAs. The remaining sequences were further mapped against the transcriptome of G. molesta (unpublished data) and genome of highly homological lepidopteran D. plexippus (downloaded from MonarchBase website: http://monarchbase.umassmed.edu). The reads that could map to the transcriptome and/or genome were further subjected to the secondary hairpin prediction using RNAfold software (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi). The sequences of transcriptome/genome that 20 bp upstream and 60 bp downstream of the mapped sequences and the sequences 60 bp upstream and 20 bp downstream of the mapped sequences were both extracted as candidate precursors for secondary structure prediction. The criteria for secondary structure prediction were: (1) number of nucleotides in one bulge in stem ( 12); (2) number of base pairs in the stem region of the predicted hairpin (! 16); (3) cut of free energy (kCal/mol 15); (4) length of hairpin (up and down stems + terminal loop ! 50); (5) length of hairpin loop ( 20); (6) number of nucleotides in one bulge in mature region ( 8); (7) number of biased errors in one bulge in mature region ( 4); (8) number of biased bulges in mature region ( 2); (9) number of errors in mature region ( 7); (10) number of base pairs in the mature region of the predicted hairpin (! 12); (11) percent of mature in stem (! 80%). The reads with precursors that satisfied all of the above standards and with more than 10 reads account were determined as novel miRNAs. The secondary structure of novel miRNAs were then constructed using srna-workbench V2.5.0 (http://srna-workbench.cmp.uea.ac.uk/ downloadspage/) with the precursor sequence and the brief-code of the corresponding secondary structure of each novel miRNA.

Validation of the Illumina sequencing
To validate the Illumina sequencing, 10 conserved miRNAs that have been reported with relative stable expression in other organisms [45][46][47] and 10 novel identified miRNAs with high abundance in sRNA library of G. molesta (S1 Table) were selected and their expression level in the sample for construction of sRNA library were checked using qPCR. What's more, the 20 selected miRNAs were respectively cloned into pMD18-T (Takara) and were further validated with Sanger sequencing by Beijing Genomics Institute (BGI).

Cluster and conservation analysis
Cluster analysis was conducted among miRNAs at the distance of 3 kb, 5 kb, 10 kb and 50 kb within the genome of D. plexippus. One cluster was determined when at least two miRNAs could be found in a specific distance range.
The miRNA sequences of arthropods, C. elegans and Homo sapiens have been downloaded from miRBase (release 21). Conservation of miRNA families among G. molesta and other arthropods was compared through BLASTn analysis. Phylogenetic relationships among 27 insect species belonging to 6 orders in Hexapoda and 4 species in the other three phyla of arthropods have been analyzed. The topology tree was constructed according to previous reports [48,49].
G. molesta transcriptome. The target genes with context score percentile less than 50 were then eliminated by TargetScan. The ones with max energy greater than -10 were screened out by miRanda. Finally, the target genes simultaneously recommended by the two algorithms were defined as the final prediction.
Functional pathway determination of target genes was performed through the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis against the public database (http://www.genome.jp/kegg) [50]. Pathways with P-value of fisher's exact test 0.05 were determined to be significantly enriched pathways for metabolism or signal transduction. For elucidation of the interactions between miRNAs and target genes in metabolisms and signal transductions, networks of miRNA-KEGG in G. molesta were further analyzed with miRNA-mRNA target sequence analysis technique and were descripted with Cytoscape 2.8.3.

Sample preparation and candidate sRNA selection for reference gene selection
Eight experimental conditions had been considered for study on expressional stability of sRNAs in G. molesta, including two intrinsic biotic conditions (developmental stage and tissue) and six exogenous stress conditions (temperature, photoperiod, starvation, JH injection, dsRNA injection and insecticide).
For samples of different developmental stages, 300 eggs, 200 first instars, 30 second instars, 10 third instars, 10 fourth instars, 7 fifth instars, 10 prepupae, 10 female pupae, 10 male pupae, 10 female adults (3 d old post-emergence) and 10 male adults (3 d old post-emergence) of G. molesta have been collected respectively as one replicate. For sample preparation of different tissues, 20 fifth instars (2 d old post-molt) and 10 adults (3 d old post-emergence) were dissected under a binocular microscope in 10 mM cold phosphate buffered saline (PBS, pH 7.8) as one replicate; after rinsed in PBS, samples of each tissue, including 9 tissues of larvae (head, cuticle, foregut, midgut, hindgut, Malpighian tubule, fatbody and ventral nerve cord) and 4 tissues of adults (head, thorax, abdomen and leg) were obtained respectively from pooled dissections.
For sample preparation of G. molesta subjected to stresses of varying temperature, photoperiod and food deprivation, the third instars were treated under nine temperature conditions (4, 26 and 40˚C for 2, 12 and 24 h, respectively), eight photoperiod conditions (24 h L: 0 h D, 0 h L: 24 h D, 14 h L: 10 h D and 10 h L: 14 h D for 1 d and 2 d, respectively), and four starvation conditions (starved for 12 h, 24 h, 48 h and 48 h followed by 24 h refeeding). For insect samples pretreated with double strand RNA (dsRNA) and juvenile hormone (JH) injection, the fifth instars were respectively injected with artificially synthesized dsRNA of met (encoding putative JH receptor Methoprene-tolerant) and chemical reagent JH III (Sigma-Aldrich, Milwaukee, WI, USA). DsRNA of egfp (encoding Enhanced Green Fluorescent Protein) and chemical reagent dimethyl sulphoxide (DMSO, solvent of JH III, Sigma-Aldrich, Milwaukee, WI, USA) were injected as control treatments accordingly. DsRNAs of met and egfp were artificially prepared with MEGAscript RNAi kit (Ambion) with primer pairs listed in S2 Table. For stress induced by insecticide, the third instars were topically applied with LD 50 of 2.5% β-cypermethrin (1 μL of 1500 times dilution per larva, or 0.625 ng per larvae) calculated using PoloPlus TM software (LeOra software, Berkeley, CA, USA) and were then sampled, respectively, at 12 h, 24 h and 48 h post application. Ten insects pretreated under each exogenous stress condition were collected as one replicate. Three replicates were collected for all of the eight treatments.
All insect samples were snap frozen in liquid nitrogen and stored at -80˚C. Total RNA of each sample was then extracted with Trizol reagent as above and checked with Nanodrop 2000 (Thermo Fisher Scientific, Waltham, MA, USA). Subsequently, cDNAs of sRNAs were synthesized after reverse transcription from the qualified RNA with A260/280 value of 1.8-2.3 and A260/230 value ! 2.0 using miScript II RT Kit (Qiangen, Dusseldorf, Germany). Universal reverse primer (GAATCGAGCACCAGTTACGC) was used for cDNA synthesis according to the manufacture's protocol. All of the resulting cDNA samples were stored at −20˚C for sRNA expressional analysis.
Quantitative PCR analysis qPCR with high sensitivity and specificity was adopted for validation of sRNA library sequencing and quantification of selected candidate reference genes through measurement of mature sRNA expression.
The forward primers for qPCR amplification of miRNAs were designed according to the specific sequences of miRNAs with some modifications at 5' ends for adjustment of GC contents (S1 Table). The reverse primer for qRT-PCR analysis was the universal primer used for cDNA synthesis above. All primers were synthesized commercially (Sangon Biotechnology, Shanghai, China) and diluted to 10 μM. qPCR was conducted on a Bio-Rad CFX Connect TM Real-Time PCR Detection System (Hercules, CA, US) using miScript SYBR1 Green PCR kit (Qiagen). The amplification was performed following the program as: 94˚C for 15 min for denaturation, 40 cycles of 95˚C for 15 s, 55˚C for 30 s, 72˚C for 30 s for collection of amplification signal and 70˚C for 30 s for reaction termination. The dissociation protocol was also carried out for melting curve analysis on the specific amplification. The amplification efficiency (E%) and correlation coefficient (R 2 ) for amplification of each sRNA were calculated according to the standard curve generated from the five 5-fold serial dilution points of cDNAs and their corresponding C q values [35,55].

Statistical analysis for selection and validation of reference genes
Global mean that was defined as the mean C q value of all expressed sRNAs in a given sample [34] was calculated. The Spearman's rank correlation coefficient r s between C q value of each candidate reference gene and the corresponding global mean was then respectively analyzed under each experimental condition [56]. The expressional stabilities of the candidate reference genes were preliminarily ranked according to their r s values and the sRNA with r s value closer to 1 was considered more stable.
The expressional stability of candidate sRNAs with r s value > 0 were further assessed using commonly used software geNorm [57]. The linear scale expression quantity of each candidate gene was calculated by the equation 2 (-ΔCq) (according to the handbook of geNorm TM kit with perfect probe, PrimerDesign Ltd.) and loaded in the Excel-based geNorm software. The optimal number of reference genes used under each experimental condition was also recommended by geNorm through pairwise variation analysis.
To validate the selected reference genes, transcriptional level of let-7 in G. molesta at different developmental stages were detected and compared with different normalizers, including the single one best recommendation hsa16 (NF1), the recommended combination hsa16 + U6 + bmo2b (NF(1-3)), one least stable miRNA bmo281 (NF8) ranked by geNorm, and single mse92a (NF12) with the lowest r s value. Relative expression levels of let-7 were calculated according to the ΔΔCq method. The geometric mean calculated from the C q values of the three sRNAs (hsa16, U6, bmo2b) were used as normalization factor for NF (1)(2)(3). For each specific developmental stage, the significant difference among the expression levels of let-7 calculated according to different normalization factors were statistically analyzed using one-way ANOVA with SPSS Statistics 17.0 (SPSS Inc., Chicago, IL, USA). Specifically, Tukey test was adopted for analysis of gene expression in G. molesta of fifth instar, male pupa, female pupa, and female adul. Games-Howell test was performed in analysis of gene expression in G. molesta of first instar, second instar, fourth instar and prepupa. For expressional analysis in G. molesta of third instar and male adult, Games-Howell test was also applied after square root transformation of the relative gene expression values.

Profile of sRNA library
In order to understand the miRNA profile, a sRNA library was constructed with a mixture of RNA extracted from G. molesta at varying developmental stages. A total of 16,305,575 raw reads were obtained, including 2,219,608 unique sequences. After trimming of the adaptor and junk reads, filtering out the Rfam reads (rRNA, tRNA, snoRNA, snRNA and other Rfam RNA), and removing the repeats, a total number of 5,998,101 unique reads were finally obtained for further analysis ( Table 1).
The length of sequenced sRNAs ranged from 18-26 nucleotides (nt). The length of the total clean reads mainly showed two peaks, with the highest one at 26 nt and the other at 22 nt ( Fig  1A). The number of the unique clean reads increased with the lengths of the reads (Fig 1A), and the highest number of miRNAs were found with length of 22 nt (Fig 1B).

Identification of conserved and novel miRNAs
After all of the unique clean reads were subjected to the alignment in the miRBase 21.0, transcriptome of G. molesta and the genome of D. plexippus, and were performed with the analysis  Table).
Among the 291 conserved miRNAs, only 40 miRNA showed high expression levels (> 2000 reads) ( Table 2), and miR-10, miR-8, miR-276, miR-14 and miR-281 were identified as the most abundantly expressed conserved miRNAs in G. molesta. Majority of the conserved miRNA showed middle (132 miRNAs with 10-1999 reads) to low (119 miRNAs with < 10 reads) expressional levels (S4 and S5 Tables). In order to detect the conservation of miRNAs among G. molesta and other arthropods, 79 miRNA families identified in the sRNA library of G. molesta were compared within 31 arthropods, and the phylogenetic relationship among these arthropods were constructed with C. elegans and H. sapiens as outgroup taxa (Fig 2). miRNAs were relatively conserved in the same order, and the miRNAs were more conserved in orders with closer phylogenetic relationships. Insects appeared having more expressed miR-NAs than those in other arthropods, Nematoda and Chordata, and lepidopterans seemed owing the most abundant miRNAs in Hexapoda. In the present study, miR-9 was conserved in all of the insect species, and miR-745 was conserved in Lepidoptera but absent in other species.
Most (82.04%) of the identified novel miRNAs showed low expressional levels (< 10 reads) (S4 and S5 Tables), 39 (15.92%) of the novel miRNAs showed middle abundance (10-1999 reads) ( Table 3), only 5 novel miRNAs exhibited with high abundance (> 2000 reads) ( Table 3). The 5 highly expressed novel miRNAs were gmo-miR-PC-5p-15_50867 (127,819 reads), gmo-miR-PC-3p-598_1629 (17,872 reads), gmo-miR-PC-3p-674_1493 (17,424 reads), gmo-miR-PC-5p-1258_867 (14,307 reads) and gmo-miR-PC-5p-1018_1028 (9,152 reads), all of which accounted about 96.94% of all the identified novel miRNA reads. The secondary hairpin structures of the 44 novel miRNAs with middle to high expressed levels in the sRNA library of G. molesta were all predicted (Fig 3). The 44 miRNAs could be divided into three types: (1) 10 miRNAs derived from either the 5' or 3' mature miRNAs of the 5 corresponding precursors, (2) 27 miRNAs only from the 5' mature miRNAs, and (3) 17 miRNAs only from the 3' mature miRNAs ( Table 3 and Fig 3). In order to verify the Illumina sequencing, the expressional quantity and sequence accuracy of the 10 conserved and 10 novel miRNA determined in the sRNA library of G. molesta were validated with the qPCR method and Sanger sequencing. The result of qPCR demonstrated that the abundance of most selected miRNAs showed concordant expressional profiles with the corresponding reads accounts in sRNA library, although bmo9a exhibited very low expressional level in the qPCR test (Fig 4). Sequence alignment illustrated that most of the miRNA sequences sequenced by Illumina platform were in accordance with those sequenced by Sanger method. The substitutions at the 5'end produced by the artificially added bases in the forward primer and the differences of one or two bases at the 3' end resulted from sequencing deviation all have no influence on annotation and target prediction of miRNAs (S7 Table). Therefore, Illumina sequencing is a reliable platform for the overall understanding of miRNA profile in G. molesta.
Cluster of miRNAs miRNAs that were transcribed from the same primary miRNA (pri-miRNA) frequently exhibit clustered distribution in the genome and were usually expressed as a polycistronic transcript with co-regulation activity in biological networks. The analysis of the miRNA cluster referred to the genome of D. plexippus ( Table 4) revealed that more miRNAs in G. molesta were linked within 10-50 kb genomic distances, with 28.92% of miRNAs clustered within 50 kb and 26.51% miRNAs closely linked within 10 kb. There were 22.89% and 21.69% miRNAs respectively clustered within 5 kb and 3 kb genomic distance. The mean number of miRNAs per cluster was 3.2, 2.9, 2.7 and 2.8 correspondingly in 50 kb, 10 kb, 5kb and 3 kb genomic distances. At the same genomic distance, a positive and correlated expression relationship has been normally detected among miRNAs within the same cluster ID, or at least two of which appeared with similar expressional profiles. However, the expression patterns (reads account) between distance-neighboring miRNAs of the same cluster varied among different clusters. Conservation analysis of the 79 miRNA families identified from the sRNA library of G. molesta was conducted among 33 species belonging to the three phyla, Arthropoda, Nematoda and Chordata. The miRNA information of the other 32 species was retrieved from the miRBase. Colored box indicated the presence of the conserved miRNA family and the same color indicated species belonging to the same order of insect in Hexapoda, the same class in Arthropoda except for the Hexapoda species, C. elegans in Nematoda, and Homo sampiens in Chordata. The phylogenetic tree was constructed according to previous reports [48,49]. doi:10.1371/journal.pone.0171120.g002

KEGG analysis of predicted target genes
KEGG is a database that helps understanding the molecular interaction and reaction networks in cells and organisms (KEGG PATHWAY Database: http://www.kegg.jp/kegg/pathway.html).
In this study, the KEGG analysis of predicted target genes of miRNAs was used to increase our understanding about the biological functions of the identified miRNAs in the metabolism and development of G. molesta. A total of 1,896 target genes of the miRNAs in the G. molesta were significantly enriched in the KEGG analysis, among which 410 genes were matched to 16 KEGG pathways, including 10 pathways of amino acid metabolism and protein processing  Table 4. miRNA cluster analysis in G. molesta.
doi:10.1371/journal.pone.0171120.t004 eight candidate reference sRNAs (bmo-miR-2b-3p_1ss22GC, pxy-mir-6497-p3_1ss7CT, hsa-miR-16-5p, bmo-miR-281-3p_L-2R+2, U6 snRNA, bmo-miR-279a_R+2, 5S rRNA and bmo-miR-9a-5p) were preliminarily identified by their positive correlation with the global mean expressions ( Table 5), and were further evaluated with geNorm software. geNorm software ranks the reference gene according to their M values (average expression stability) and provides optimal recommendation of the number of reference genes based on the pairwise variation (V) analysis. In this study, the expressional stabilities of the 8 candidate reference genes were ranked under eight different experimental conditions (S2 Fig). The candidate reference, bmo2b, was ranked with high stability under seven experimental conditions, except for temperature treatment; U6 and bmo9a were all ranked as stable references under four experimental conditions. In combination with the V value (S3 Fig), the recommendations of internal normalizers for analysis of miRNA expressional profiles of G. molesta using qRT-PCR were determined for each of the eight specific experimental conditions. For intrinsic biotic conditions, 3 sRNAs (hsa16, U6 and bmo2b) used together were recommended for study of different developmental stages. Combined usage of 6 sRNAs (hsa16, bmo2b, U6, 5S, bmo279 and bmo9a) was recommended for miRNA quantification in different tissues ( Table 6). For different exogenous stresses, it is recommended 2 sRNAs (bmo281 and U6) for temperature changes, 3 miRNAs (bmo2b, bmo279 and hsa16) for photoperiod treatment, 2 miRNAs (bmo279 and bmo9a) for starvation, 2 miRNAs (bmo2b and bmo9a) for JH exposure, 2 miRNAs (bmo2b and bmo281) for dsRNA treatment, and 2 sRNAs (bmo281 and U6) for insecticide application ( Table 6).

Validation of reference gene selection
To evaluate the performance of selected reference genes, the expression level of let-7, which participates in development regulation in insects [58,59] and showed dynamic expressional profiles in different developmental stages of Bombyx mori [60] was examined in different developmental stages of G. molesta. Similar expression profiles of let-7 in different developmental stages of G. molesta were detected when single best recommended reference NF(1) (hsa16) and the optimal recommendation NF(1-3) (hsa16 + U6 + bmo2b) were used as normalization factors, only with different expression patterns in the second vs third instars and fifth instar vs male pupa ( Table 7). However, the use of single least stable reference NF (8) and NF (12), respectively, showed significant down-regulation and up-regulation of let-7 within all tested developmental stages of G. molesta ( Table 7).

Discussion
The miRNA information of several Lepidopteran insects has been identified with reference to their genomic information [20,61] or genome sequences of the model insect silkworm (B. mori) [62,63]. In the present study, a pooled sRNA library of G. molesta was prepared from mixed developmental stages of oriental fruit moth, and was further analyzed in reference to its own transcriptome and the genome of D. plexippus which was found with the highest similarity to the transcripts of G. molesta in the phylogenetic conservation analysis (unpublished data). A total of 536 mature miRNAs composed of 291 previously reported and 245 novel ones were finally identified in G. molesta. In previous studies of Lepidopteran insects, 55 conserved and 202 novel miRNAs were found in B. mori [61], 163 conserved and 13 novel miRNAs were identified in M. sexta [20], and 97, 91 and 69 conserved together with 1, 8 and 383 novel ones were respectively identified in the miRNA analyses of Helicoverpa armigera, Spodoptera litura and P. xylostella [62,63]. With the cumulative availability of miRNA data and the increase of sequencing depth, more information of mature miRNA would be uncovered and more novel miRNA could be identified in lepidopterans, which would provide useful repertoires for clarification of the modulation complex in insects and other organisms. Among the 291 conserved miRNAs identified in the sRNA transcriptome of G. molesta, 40 miRNAs showed highly expressed levels, which probably play critical roles in various regulatory processes in the oriental fruit moth. The miRNAs with the highest account of reads in this study including miR-10, miR-8, miR-281, and miR-263 were also identified as the most abundantly expressed ones in A. gambiae [64]. miR-10 and miR-8 were found with high abundance in the digestive tract of green bottle fly maggot Lucilia sericata and both were considered involved in secretions [65]. A study on S. exigua demonstrated that miR-10 also participated in  (1), using the single one best reference hsa16 for normalization; NF(1-3), using the recommended combination of the three best reference sRNAs, hsa16, U6 and bmo2b for normalization; NF (8), using the single least stable reference bmo281 for normalization; NF(12), using single mse92a with the lowest r s value for normalization # Means within the same line followed by the different letters are significantly different (P 0.05). doi:10.1371/journal.pone.0171120.t007 developmental regulation since oral feeding using synthetic miRNA mimics of miR-10-1a resulted in suppressed growth and increased mortality [66]. miR-8 showed pleiotropic regulatory functions in insects, including neurogenesis through targeting atrophin mRNA or regulating synaptic activity [67,68], homeostasis of innate immunity by altering the expression of antimicrobial peptides [69], and development tuning via responding to ecdysone [70]. miR-281 is another multi-functional miRNA in insects. It is involved in regulation of long chain fatty acid synthesis and cuticle formation in M. sexta [71], highly expressed in Malpighian tubules and participating in developmental regulation through targeting and suppressing the transcription of ecdysone receptor-B (EcRB) in B. mori [72], and facilitating the replication of dengue virus (DENV-2) in vector mosquito Aedes albopictus by specifically induced in female midgut upon DENV-2 infection [73]. The conserved miR-263 was abundantly expressed in pupae and may participate in temporal regulation during silkworm development [74]. The other two miRNAs, miR-276 and miR-14 that showed reads account at top levels in the sRNA library of G. molesta, were also found with regulatory effects in the development, reproduction and immunity of insects. miR-276 was noted as one of the miRNAs for identifying reproductive stages of A. mellifera independent of caste [75] and was found promoting egg-hatching synchrony of locusts through up-regulation of the target, a transcription coactivator gene brahma (brm) [10]. In studies of Drosophila, miR-14 has been considered as an important regulator in the development process, mainly through negative modulation of the Hedgehog signaling [76], positive regulation of the autoregulatory loop of ecdysone signaling [77] and autophagy regulation during salivary gland cell death by targeting 1,4,5-triphospate (IP3) signaling [78]. miR-14 was also found involved in the interactions of insect and virus, e.g. H. armigera single nucleopolyhedrovirus (HaSNPV) manipulated EcR transcription in H. armigera through down regulation of miR-14 in host insect [79] and miR-14 in the white-backed planthopper Sogatella furcifera may participate in the immune response against the southern rice black-streaked dwarf virus vectored by the planthopper [80]. Analysis about organisms of different kingdoms demonstrated that large fractions of miR-NAs share conservation among closely-related species and even species with distant phylogenetic relationships, suggesting evolutionarily conserved regulatory functions across species [81]. In the present study, the 79 miRNA families found in the sRNA library of G. molesta were detected to be conserved among the Lepidoptera insects, yet most of the miRNA families numbered from miR-1175 to miR-8506 were absent in other species. More miRNAs among the 79 families were detected to be present in Hexapoda species, whereas notably less number of miRNA families were found in species in Brachiopoda, Chiopoda and Arachinida, although they all belong to Arthropods. Let-7 and miR-124 were conserved among Arthropoda, Nematoda and Chordata species in our analysis. Previous analyses suggested that many miRNA families, such as let-7, miR-10, miR-99 and miR-125 were highly conserved among animal species [46], whereas miR-99 was not found in the sRNA library of G. molesta, while miR-10 and miR-125 were conserved in most species but absent in C. elegans. Numerous studies have demonstrated that miRNAs showed widely conservative evolution, and thus have been proposed to be used as additional sequence markers in combination of mitochondrial or gnomic DNA for evolutionary modeling and phylogenetic analysis among various taxa [82,83]. Insects have been considered as an excellent model for study about regulatory functions of miRNAs in conserved regulatory pathways between vertebrates and invertebrates [3]. Besides of sequence conservation, miRNAs were also demonstrated with spatial and temporal expression conservation. A comparative study among bilaterian animals suggested that the tissue specificity of ancient microRNAs was highly conserved [84]. A recent study on Drosophila discovered that the temporal expression of orthologous microRNAs could be more conserved than their sequences and the hourglass pattern of miRNA expression are highly similar, not only among the miRNAs with highly conserved sequences, but also for rapidly evolving orthologs [85]. Therefore, as an important regulation component at posttranscriptional level between mRNA and gene-encoding protein, miRNA and its expression pattern were considered critical in understanding the evolution of developmental gene expression.
Similar to previous studies in planthopper L. striatellus [46] and diamondback moth P. xylostella [63], the abundances of novel miRNAs were also found very low in G. molesta and the 5 novel miRNAs with the highest abundance may play important roles in the development of G. molesta. The secondary structure prediction demonstrated that only a small fraction of mature ones were generated from both arms and most of the mature novel miRNAs were produced from either the 3' or 5' arms of the hairpins with a slight bias toward 5' arm usage (5p/ 3p proportion was around 1.59). The arm usage preference differed in various Diptera, such as 5' arm-bias in Drosophila [86], but 3' arm-bias in A. gambiae [64]. Studies in mammalians illustrated that selection and usage of preferred arm could be dynamically regulated in a development and tissue specific pattern [87]. The study in A. gambiae revealed that arm-switching event could happen after blood feeding [64], all of which suggest that miRNAs are expressed under a condition-specific manner and arm switching can significantly enrich the regulatory capacity of miRNA.
As the next-generation sequencing (NGS) method, Illumina sequencing enables the highthroughput and comprehensive understanding about the genome, transcriptome, epigenome or microbiome of a variety of organisms [88][89][90][91]. However, Illumina sequencing has been found more prone to produce deviation than Sanger technique, as it is vulnerable to be contaminated in the process of library preparation and accompanied with deviations resulted from base bias and subjective supposition in the bio-information analysis. In the present study, qPCR and Sanger sequencing were adopted for validation of the sRNA library of G. molesta sequenced by Illumina platform and concordant results between different methods were detected for most of the miRNAs. The divergences between Illumina sequencing and qPCR were mainly due to the limited replication of Illumina analysis, thus, qPCR validation is indispensible for sophisticated measurement about the expressional pattern of specific miRNA of interest. Alignment detected sequence differences happened in some miRNAs between Sanger and Illumina sequencing results. However, most parts of these miRNA sequences were in agreement between the two methods and the seed regions for target prediction were intact after eliminating the sequence deviations at the 5' end caused by primer modification for adjustment of GC content. Therefore, Illumina sequencing is a reliable platform for the overall understanding of miRNA profile in G. molesta.
Cluster analysis is an important part for understanding the biological network and evolutionary conservation of miRNA. In this study, the clusters of the conserved and novel miRNAs determined in G. molesta were analyzed in reference to the genome of D. plexippus. Fifteen clusters were identified separately at 50 kb, 10 kb, 5kb and 3 kb genomic distances. Most of the miRNAs in the same cluster showed concordant expressional abundance within the same genomic distance. A study showed that expressional levels of clustered miRNAs in mosquito libraries were highly correlated and the mature miRNAs in the same cluster exhibited a strong bias towards the same arm selection [64]. However, the arm selection of the miRNAs in the same cluster showed in a more dynamic manner in this study, with only a small bias towards the 5' arm selection in the same cluster. Conservation of miRNA clusters among related species would provide evolutionary evidence for functional shift of miRNA in insects. A study found that the cluster organized in mir-2/mir-13/mir-71 was highly conserved in insects [86], which was not detected in G. molesta. Cluster analysis is highly dependent upon the corresponding genome sequences, hence in future, with the information of genome sequence of G. molesta and the availability of sequence information in its phylogenetic related species, more useful information of miRNA data would probably be revealed through comparative studies among agricultural pests.
Target prediction and KEGG pathway analysis are important to understand the regulatory networks of miRNAs. In this study, except for the majority pathways in the processing, repairing and surveillance of nucleic acid, amino acid and protein plus one pathway function in the plant-pathogen interaction, three pathways for controlling the growth and development modulation (including two pathways involved in development regulation and one pathway participating in circadian rhythm) were identified in the sRNA library constructed from different developmental stages of G. molesta. The network prediction of the identified circadian rhythm pathway further revealed the close relationship among a range of miRNAs, clock genes and developmental regulation genes in G. molesta. A majority of insect developmental processes exhibit circadian rhythm (including growth, ecdysis, metamorphosis, diapause, and eclosion) and developmental circadian rhythm can be profoundly impacted by environmental changes, especially the daily or seasonally thermal and photoperiodic alterations [92]. Correspondingly, most organisms have evolved circadian clock to adjust their development and metamorphism rhythm for coping with environmental changes [93]. Recent studies have detected that the miRNA might be an important regulator of circadian rhythmicity at posttranscriptional level in various taxa of organisms [94,95]. The overall understanding about the network of miRNA and their target genes in the developmental regulation of insects would provide useful information for the monitoring and management of agricultural pest.
Since the discovery about the vital roles of miRNA at post-transcriptional level, precisely expressional profiles of miRNA under specific conditions have been considered as the prerequisite for understanding its regulatory functions. Identical to the quantification of coding genes, using suitable normalizers synchronically processed with tested genes is also considered as an effective way for eliminating the errors from materials and experimental manipulation and realizing accurate evaluation of miRNA profiles [96]. In recent years, numerous studies on selection and validation of reference genes for miRNA quantification have been published being relevant to plants, human, vertebrates and nematode [45,47,97,98], whereas few studies have been reported about insects. In the present study, on the basis of the sRNA library of G. molesta, one small nuclear RNA U6, one small ribosomal RNA 5SrRNA and ten miRNAs were preliminarily selected as candidate references according to the previous studies [45][46][47][51][52][53][54] and their stabilities under two biotic and six abiotic conditions were further assessed with global mean method and geNorm algorithm. Normally, four statistical methods, geNorm, Normfinder, delta Ct and Bestkeeper are adopted for expressional stability analysis. Based on different algorithms, however, inconsistent outcomes are usually obtained from the four software. Therefore, the comprehensive ranking of expressional stability is hard to determine from parallel analysis with the four algorithms. In the present study, "global mean", a universal method for analyzing the transcriptional stability of microRNA [34], was adopted for preliminary screening of the candidate reference genes. A final recommendation of suitable normalizers was then clearly acquired with further evaluation by geNorm. In the assessment of reliable internal controls for miRNA expressional quantification in C. elegans, a genome-wide RNA-seq was conducted for preliminary selection of control miRNAs with minimal variation, and similarly, the final stabilities of candidate reference miRNAs were then ranked by applying the common geNorm logarithm [97]. The comprehensive analysis in the present study detected that intrinsic biotic factors including developmental stages and tissues impacted more on the stability of miRNA expression in G. molesta, as a greater number of reference genes should be used for normalization in samples of development stages or tissues. Similarly, investigation in the Chinese perch revealed that embryonic developmental stage was an important factor to the variability of miRNA expression [47]. In previous studies of miRNA evaluation, U6 has always been empirically chosen as an internal control and U6 has been recommended as a stable reference gene under half of the tested conditions in our study. However, the expressional stabilities of U6 and other small nuclear RNAs were actually not be acceptable as endogenous controls in many experimental conditions [45,47]. The present and previous studies all illustrated that misusage of an internal control gene would lead to notably distorted results and misunderstanding of gene expressional profiles [99].
In summary, a total of 536 mature miRNAs including 291 conserved and 245 novel miRNAs was identified in G. molesta. The abundance, conservation and cluster of identified miRNA were analyzed. The KEGG pathway analysis and network prediction of target genes demonstrated that the network composed of miRNAs, clock genes and developmental regulation genes probably play critical roles in the regulation of developmental circadian in G. molesta. Furthermore, suitable reference genes were selected and validated for study on miRNA expressional profile in G. molesta under two biotic and six abiotic experimental conditions. The present study provides an overview of miRNA profile in G. molesta and may serve as a basic reference for evaluation of miRNA abundance in this pest and other insects. Further studies, such as analyzing of the interaction relationship between miRNAs and their targets and deciphering of the regulatory functions and mechanisms of specific miRNAs will shed light on the deeper interpretation of the miRNA-involved post-transcriptional regulation in G. molesta and might also provide a useful foundation for development of new targets or genetic-based techniques for agricultural pest control in the future.