Differential microRNA Analysis of Glandular Trichomes and Young Leaves in Xanthium strumarium L. Reveals Their Putative Roles in Regulating Terpenoid Biosynthesis

The medicinal plant Xanthium strumarium L. (X. strumarium) is covered with glandular trichomes, which are the sites for synthesizing pharmacologically active terpenoids such as xanthatin. MicroRNAs (miRNAs) are a class of 21–24 nucleotide (nt) non-coding RNAs, most of which are identified as regulators of plant growth development. Identification of miRNAs involved in the biosynthesis of plant secondary metabolites remains limited. In this study, high-throughput Illumina sequencing, combined with target gene prediction, was performed to discover novel and conserved miRNAs with potential roles in regulating terpenoid biosynthesis in X. strumarium glandular trichomes. Two small RNA libraries from leaves and glandular trichomes of X. strumarium were established. In total, 1,185 conserved miRNAs and 37 novel miRNAs were identified, with 494 conserved miRNAs and 18 novel miRNAs being differentially expressed between the two tissue sources. Based on the X. strumarium transcriptome data that we recently constructed, 3,307 annotated mRNA transcripts were identified as putative targets of the differentially expressed miRNAs. KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis suggested that some of the differentially expressed miRNAs, including miR6435, miR5021 and miR1134, might be involved in terpenoid biosynthesis in the X. strumarium glandular trichomes. This study provides the first comprehensive analysis of miRNAs in X. strumarium, which forms the basis for further understanding of miRNA-based regulation on terpenoid biosynthesis.


Introduction
MicroRNAs (miRNAs) are small non-coding, endogenous RNAs consisting of~22 nt in average, and are generated from large stem-loop precursors transcribed from non-protein-coding genes, introns or coding regions of the host genome [1,2]. They interact with mRNAs through perfect or non-perfect complementarity to degrade mRNAs or repress translation, thus were abraded in a cell disrupter (Bead-Beater, BIOSPEC, USA) using glass beads in an isolation buffer (25 mM MOPSO, pH 6.6, 200 mM sorbitol, 10 mM sucrose, 5 mM thiourea, 2 mM dithiothreitol, 5 mM MgCl 2 , 0.5 mM sodium phosphate, 0.6% (w/v) methylcellulose and 1% (w/v) polyvinylpyrrolidone). The disrupted extracts were filtered through a 425 μm nylon mesh, and the filtrate was then consecutively passed through 125, 80 and 42 μm nylon meshes with a resin buffer (25 mM MOPSO, pH 6.6, 200 mM sorbitol, 10 mM sucrose, 5 mM thiourea, 2 mM dithiothreitol, 5 mM MgCl 2 and 0.5 mM sodium phosphate). The isolated glandular trichomes were retained on the 42 μm mesh. Each sample was flash frozen in liquid nitrogen and then stored at −80°C for RNA isolation.

Small RNA library construction and high-throughput sequencing
Total RNA was extracted from fresh young leaves and the isolated glandular trichomes with Trizol reagent (Ambion). The quantity and quality of RNA samples were measured by Eppendorf BioPhotometer plus to ensure that the OD260/OD280 values were between 1.8 and 2.2. The RNA integrity was examined by agarose gel electrophoresis. Small RNA sequencing was performed using an Illumina Genome Analyzer at the Beijing Genomics Institute (BGI, Shenzhen, China). Small RNA fractions with the length range from18 to 30 nt were purified and then ligated to a 5' and 3' adaptor. After the reverse transcription followed by 11 cycles of polymerase chain reactions, approximately 20 μg of the amplified products were used for sequencing.

Analysis of the sequenced data of the small RNAs
Small RNA reads with a length of 49 nt were produced by Illumina. Then data processing analysis was conducted as follows: (1) Removal of low-quality reads (more than four bases with sQ values below 10, and more than six bases with sQ values less than 13); (2) Removal of reads with 5 0 adaptor contaminants; (3) Removal of reads without 3 0 primer; (4) Removal of reads without an insert tag; (5) Removal of reads with poly A; (6) Removal of reads shorter than 18 nt; and (7) A summary of the length distribution of the clean reads. The remaining clean reads were mapped to X. strumarium transcriptome with less than two mismatches to analyze the expression and distribution of the small RNAs using SOAP software [41].To annotate the small RNAs, the sequences were aligned to the NCBI GenBank (http://www.ncbi.nlm.nih.gov/ genbank/) and Rfam (http://rfam.sanger.ac.uk/) 10.1 databases by a BLAST search [42,43]. The matched tags, including rRNA, scRNA, snoRNA, snRNA, and tRNA were eliminated. The remaining tags were used to detect conserved and novel miRNAs. The transcriptome databases of the X. strumarium small RNAs and mRNAs were deposited at the sequence read archive (SRA) of National Center for Biotechnology Information (NCBI) under the accession numbers of SRP056720 and SRP056511, respectively.

Identification of the conserved miRNAs
There is no miRNA information for X. strumarium in miRBase. To identify the conserved miRNAs, the following strategy was used: first, considering the differences between species, clean data was aligned to mature miRNAs or miRNA precursors of all plants in miRBase 20.0 (http://www.mirbase.org) [44] allowing two mismatches using tag2miRNA software (developed by BGI); second, we chose the most abundant miRNA from each mature miRNA family to construct a temporary miRNA database; third, we aligned the clean data to the above temporary miRNA database and the expression of miRNA was generated by summing the count of all tags which were aligned to the temporary miRNA database within two mismatches. The small RNAs that were unaligned to any databases were defined as unannotated sequences.

Prediction of the novel miRNAs
The unannotated sequences ranging from 18 to 25 nt were used to identify novel miRNAs by Mireap software based on the following main criteria described by chen et al. [45]: (1) The tags which could be used to predict novel miRNAs came from the set of unannotated tags which were matched to the transcriptome of X. strumarium; (2) Those tags whose sequences and structures satisfied the two criteria: hairpin precursors could fold into secondary structures and the sequences were present in one arm of the hairpin precursors, will be considered as candidate novel miRNAs; (3) Hairpin precursors lack large internal loops or bulges; (4) The secondary structures of the hairpins are steady, with the free energy of hybridization lower than or equal to -18 kcal/mol; (5) The number of mature miRNAs with predicted hairpin precursors must be at least five in the alignment results.

Target gene prediction of the conserved and novel miRNAs
To obtain putative target genes, we matched the identified miRNAs to the X. strumarium transcriptome according to the rules published by Allen et al. [3] and Schwab et al. [7]. The criteria were (1) the number of mismatches between small RNAs and their targets should be less than four (G-U pairs count as half mismatch); (2) no more than two adjacent mismatches in the miRNA/target duplex; (3) no adjacent mismatches in positions 2 to 12 of the miRNA/target duplex from the 5 0 miRNA end; (4) no mismatches in positions 10 to 11 of the miRNA/target duplex; (5) no more than 2.5 mismatches in positions 1 to 12 of the miRNA/target duplex from the 5 0 miRNA end; and (6) the minimum free energy (MFE) of the miRNA/target duplex should be 75% of the MFE of the miRNA with its perfect complement.

Differential expression analysis of miRNAs between the leaves and glandular trichomes
To ensure the significance of the difference in miRNA expression, we normalized the expression of miRNAs in the two tissue sources (leaves and glandular trichomes) as transcript per million (TPM). Then those miRNAs with a P-value<0.05 (adjusted to a corrected P-value (qvalue) lower than 0.05) and an absolute value of log 2 Ratio>1 were selected as the differentially expressed miRNAs. Target gene prediction of the differentially expressed miRNAs was also conducted to better understand the regulatory roles of the miRNAs. Alignments of the miR-NAs to the corresponding target sites are shown in S1 Table. GO (Gene Ontology) functional classification and KEGG pathway analysis for the potential targets of the differentially expressed miRNAs GO is a classification system for gene function, which supplies a set of dynamically updated and controlled vocabulary to comprehensively describe the property of genes and gene products. There are 3 ontologies in GO: molecular function, cellular component and biological process. The basic unit of GO is GO-term, each of which belongs to one type of ontology. Therefore, to classify the function distribution of the potential targets of the differentially expressed miRNAs genes, the Blast2GO program was used to obtain their GO annotations [46] and the WEGO software to obtain their GO functional classifications [47]. The GO enrichment analysis of the targets was conducted and GO terms with a corrected P-value 0.05 were defined as significantly enriched terms. KEGG is a public database regarding metabolic pathways [48]. The target genes were mapped to the KEGG database to identify what pathways in which those targets of the differentially expressed miRNAs are involved.

Real-time quantitative PCR (RT-qPCR)
Stem-loop RT-qPCR was employed to validate the gene expression data from the Illumina sequencing according to the method previously described by Chen et al. [49]. The primers used for this part of the experiment were listed in S2 Table. First-strand cDNA synthesis was performed using RevertAid Reverse Transcriptase (Thermo Scientific). The reaction was carried out at 42°C for 60 min followed by incubation at 70°C for 10 min, and then held at 4°C thereafter. RT-qPCR was conducted using the FastStart Universal SYBR Green Master (Roche) and ABI 7500 Real-Time PCR System according to the manufacturer's instructions. The reactions were undertaken at 95°C for 10 min for one cycle; at 95°C for 15s, then at 62°C for 1 min for 40 cycles. All reactions were performed in three independent biological samples with three technical repeats. The melting curve was generated to test the specificity of PCR products and avoid the amplicons only from primers themselves. The actin gene of X. strumarium (GenBank accession no.JF434698) was used as an internal standard to normalize the variation in each sample manipulation and the results were analyzed using the comparative 2 -ΔΔCt method to quantify the relative expression [50].

High-throughput sequencing analysis of small RNAs
In total, 12,325,132 raw reads for the leaves and 9,076,601 raw reads for the glandular trichomes were initially generated. After data preprocessing, 12,152,212 and 8,988,274 clean reads for the leaves and glandular trichomes remained for the analysis, generating 7,261,121 and 4,842,894 total unique sequences for the leaves and glandular trichomes, respectively. 6,193,697 and 3,775,470 unique sequences (85.3% and 77.96% of the total unique sequences) were specific to the leaves and glandular trichomes (Table 1). This was indicative of the diversity of small RNA sequences in each tissue source. Little difference was found in the length distribution of the sequences from both tissue sources: the most abundant was the 24 nt small RNAs, accounting for more than 60% of the total reads, followed by 21 nt small RNAs, and small RNAs with a length of 23 nt (Fig 1). In addition, 220,115 (3.03%) and 247,453 (5.11%) unique sequences for the leaves and glandular trichomes matched to the X. strumarium transcriptome data. After annotating and removing the non-coding RNAs, including rRNAs, tRNAs, snRNAs, and snoRNAs, 37,490 (0.52%) and 33,664 (0.7%) reads for the leaves and

Identifying conserved miRNAs in both tissue sources
In X. strumarium, no miRNA has been reported at the time of drafting this manuscript. We identified 978 conserved miRNA families with 745,146 total reads in the leaves and 894 miRNA families with a total of 550,246 reads in the glandular trichomes (S3 Table). There were 687 conserved miRNA families expressed in both tissue sources (Fig 2A), of which  Identifying potential novel miRNAs in X. strumarium Based on the criteria described in the section of Materials and Methods, 22 potential novel miRNAs for the leaves and 27 for the glandular trichomes were identified in both tissue sources with at least five reads (S4 Table). Of these, only 12 novel miRNAs appeared in both samples (Fig 2B), suggesting that the expression profiling of novel miRNAs was different between the leaves and glandular trichomes. The identified novel miRNAs ranged from 20 to 23 nt, with 21nt being the most abundant (59.46%) (Fig 3A). The length of the predicted precursors for the novel miRNAs were 66 to 323 nt, with that the majority was between 50 and 150 nt (54.06%) (Fig 3C). The folding energy of these hairpin structures for the precursors of novel miRNAs was -19.7 to -101.8 kcal/mol, which most values within the range of -40 to -80 kcal/mol (54.05%) (Fig 3B). These results were similar to those observed in Chinese cabbage, Arabidopsis thaliana, Oryza sativa and Arachis hypogaea [51][52][53]. The nucleotide bias analysis showed that novel miRNAs from both tissue sources had the similar tendency on the nucleotide bias at certain key positions, for example, a strong preference for adenosine (A) at the tenth position and for uridine (U) at the first position (Fig 4), which are the typical features of miRNAs [54,55].

Target prediction of conserved and novel miRNAs in X. strumarium
Target genes for the conserved and novel miRNAs were predicted to better understand the biological functions of the identified miRNAs in X. strumarium. In total, we found 4,071 target genes for 544 conserved miRNAs and 116 target genes for 26 novel miRNAs in X. strumarium, with an average of 7.48 and 4.46 targets per conserved and novel miRNA (S5 Table). To annotate these potential targets, a BlastX search against the NCBI protein database with an E value lower than 10 −5 was performed. Some targets were annotated as transcription factors, including WRKY, Basic helix-loop-helix (bHLH), SQUAMOSA Promoter Binding Protein-Like (SPL) and basic leucine zipper motif (bZIP) proteins. Other target genes included those involved in signal transduction, metabolism, stress response and those with unknown functions. The expression levels of these targets between the leaves and glandular trichomes were also compared (S5 Table).

The identification of the miRNAs differentially expressed between the two tissue sources
The total of 512 miRNAs, including 494 conserved and 18 novel miRNAs, were found to be differentially expressed between the two tissue sources. Among them, 262 conserved and 13 novel miRNAs were up-regulated, and 232 conserved and five novel miRNAs were down-regulated in the glandular trichomes (S6 Table). To validate the miRNA expression data from the sequencing, the expression levels of 13 differentially expressed miRNAs, including eight novel miRNAs and five conserved miRNAs, were measured using RT-qPCRs. As was shown in Fig 5, the expression trend of most of the miRNAs, except for miR1134, was consistent with the Illumina sequencing results, meaning that the gene expression data of miRNAs by the sequencing technique was credible. Target prediction of the differentially expressed miRNAs Based on the X. strumarium transcriptome, a total of 3,307gene targets were identified for those differentially expressed miRNAs (S6 Table). Among these targets, some encode transcription factors such as v-myb avian myeloblastosis viral oncogene homolog (MYB), WRKY, bHLH, APETALA2/ethylene-responsive factor (AP2/ERF), bZIP and SPL proteins. For instance, the unique genes, including CL6103.Contig1 targeted by miR5072, CL1989.Conti-g2_All targeted by miR7539 and Unigene12046_All targeted by miR1850, displayed high similarities to WRKY proteins. WRKY transcription factors have been reported to play roles in regulating the biosynthesis of terpenoids [56][57][58]. By mapping those targets to the KEGG pathway database, we were able to find that some targets seemed to encode putative enzymes in terpenoid biosynthesisin X. strumarium (the ones highlighted by yellow color in S7 Table), especially in sesquiterpene biosynthesis (Table 3). For example, the upstream enzymes in the pathways of terpenoid biosynthesis, including 1-deoxy-D-xylulose 5-phosphate synthase (DXS), 3-hydroxy-3-methylglutaryl coenzyme A reductase (HMGR), isopentenyl diphosphate (IPP)/dimethylallyl diphosphate (DMAPP) synthase (IDS), and isopenteyl diphosphate isomerase (IDI), were predicted to be targeted by miR7539, miR5021 and miR1134. DXS, HMGR, IDS, and IDI are the enzymes involved in the biosynthesis of IPP and DMAPP, the common precursors of all the terpenoids [59]. In particular, HMGR is a rate-limiting enzyme of the pathway to synthesize IPP and DMAPP [60]. The target by miR6435 is homologous to   germacrene A oxidase (GAO), the first key enzyme in the pathway to the biosynthesis of xanthanolides [61]. Interestingly, xanthanolides have been considered to be the major active compounds in X. strumarium [62]. In addition, some targets are homologs to the enzymes in the biosynthesis of di-, or tri-terpenoids. For example, the beta-amyrin synthase targed by miR5491 and ent-kaurene synthase targeted by miR6449 are key enzymes that catalyze the formation of the most common triterpene β-Amyrin and diterpene ent-kaurene [63,64]. GO analysis showed that these targets could be summarized into three main categories and classified into 47 functional groups (Fig 6). Based on the biological process category, the majority of the targets were involved in "cellular process", "metabolic process" and "single-organism process". In the case of molecular functions, a large number of genes were grouped into "binding" and "catalytic activity". While in the cellular component, the genes were mostly related to "cell", "cell part" and "organelle". The GO enrichment analysis showed that the terms "mitochondrial respiratory chain complex IV" and "respiratory chain complex IV" are overrepresented in the cellular component. For the molecular function, the majority of genes were found to be involved in the "oxidoreductase activity" and "cytochrome-c oxidase activity". The GO terms "heme a metabolic process" and "heme a biosynthetic process" account for a large proportion in the biological process, which indicated that the miRNAs might play roles in modulating plant metabolic processes (S8 Table).

Discussion
Plant miRNAs have been reported to be involved in a variety of important processes, including development, signal transduction, and responses to environmental stresses [12]. There was also evidence to show that miRNAs function in regulating secondary metabolic activities. For example, the molecule miRNA156 is involved in the regulation of flavonoid biosynthesis in Arabidopsis thaliana [21]. Glandular trichomes are one type of specialized structure in synthesizing a wide range of plant secondary metabolites [69], in X. strumarium, they are also the primary sites for accumulating xanthanolides, the compounds with multiple bioactivities [40]. We hypothesized that miRNA expression in glandular cells might play roles in regulating the biosynthesis of secondary metabolites in X. strumarium such as xanthanolides. However, to the best of our knowledge, no any information is available for the miRNAs from glandular trichomes of any plant species. As the beginning to address this hypothesis, glandular trichomes were physically isolated from the young leaves of X. strumarium in this study and large sets of miRNAs in this particular structure were identified using a high-throughput sequencing technology. A database for miRNAs from its intact young leaves was also constructed and used as a comparison. A total of 894 conserved miRNAs and 27 novel miRNAs were successfully identified from the glandular trichomes. The expression levels of more than 50% of these miRNAs seem to be up-or down-regulated in the glandular trichomes compared to intact leaves. The reliability of the gene expression data was confirmed by the Q-RT-PCR analysis of the five conserved and eight novel miRNAs that were randomly selected. The expression of, novel-mir-14, novel-mir-17, and novel-mir-34 is very high in glandular trichomes and more than 1000 folds to those in intact leaves, indicating that they may have physiological functions in this specialized structure. With respects to the miRNAs with the highest abundance in the glandular cells, they may be glandular trichome-specifically expressed or transported into the trichomes from the other parts of the leaves.
To understand the regulatory roles of miRNAs, it is essential to predict and annotate its target mRNAs. Based on the feature that plant miRNAs are perfectly complementary to their targets, miRNA target genes can be predicted by a bioinformatics approach [3,7]. Using the bioinformatics tool, we were able to identify that the targets by the glandular trichomes conserved miRNAs included transcription factors and non-transcriptional factor proteins (S5 Table). Several transcription factors were predicted to be targeted by the same conserved miRNA molecule, for example, miR7539 targets MYB, bHLH, WAKY, zinc finger, DNA Binding With One Finger(DOF), SPL, and bZIP transcription factors and miR5658 targets MYB, bHLH, zinc finger, and bZIP transcription factors, suggesting that these miRNAs may play multiple roles in diverse physiological processes. Non-transcriptional factor proteins, such as DXS, HMGR, IDS and IDI, were predicted to be targets of miR7539, miR5021 and miR1134, respectively. These targets are essential enzymes in upstream isoprenoid pathway to produce IPP and DMAPP, the common precursors for all the downstream end terpenoids. In particular, HMGR is a key regulatory enzyme that controls the amounts of isoprenoids [60]. These data suggested that miR7539, miR5021 and miR1134 might be involved in regulating terpenoid biosynthesis by targeting upstream terpenoid pathway genes. Some miRNAs target putative downstream enzymes in the biosynthesis of mono-, sesqui-, di-, and tri-terpenoids. They were Rlinalool synthase, gibberellin 3-oxidase, ent-kaurene synthase, squalene epoxidase, beta-amyrin synthase, and germacreneA oxidase, which were targeted by miR7540, miR5183, miR6449, miR5255, miR5491, and miR6435, respectively (Table 3). Most interestingly, germacrene A oxidase (GAO) targeted by miR6435 is a key P450 involved in the biosynthesis of xanthanolides [61], which was previously reported to be active molecules to contribute to the pharmacological property of X. strumarium [30,34,70]. Gene expression data from the miRNAsequencing showed that the molecule miR6435 is glandular trichome-specifically expressed (S6 Table), which is also consistent with the feature that glandular trichomes are the primary sites to synthesize xanthanolides. The data allowed us to hypothesize that miR6435 might play a role in the regulation of xanthanolide biosynthesis in X. strumarium glandular trichomes. Identification of miRNAs which can perfectly bind to their mRNA targets may also provide alternate approach to isolate pathway genes, especially for those pathways that are not elucidated well. The discovery of X. strumarium glandular trichome miRNAs of this study may help to identify xanthanolide biosynthesis genes. The biosynthetic pathway of xanthanolides is not elucidated yet, especially for its downstream pathway in which cytochrome P450 enzymes are presumably involved. Here, we have tried to identify miRNAs whose targets were P450 mRNAs. In addition to GAO targeted by miR6435, we also have found that the targets of other miRNAs, including miR1512, miR3447, miR5678, miR6283, and miR7539, encode P450s, in particular, of these P450 mRNA sequences, the target mRNA by miR6283 seemed to be trichome-specifically expressed (S5 Table). Thus, it will be of interest to further experimentally perform functional analysis of miR6283 and its target. In contrast to the conserved miRNA targets, none of the targets by novel miRNAs presented in this research were transcription factors and many of them encoded cytochrome c oxidase, ABC transporter, and protein kinases, suggesting their roles in oxidation-reduction processes, transport, and signal transduction.

Conclusions
In conclusion, this is the first comprehensive identification of miRNAs from the plant glandular trichomes, the specialized structure to synthesize a wide range of medicinal molecules. We have been able to identify miRNAs and their mRNA targets that are trichome-specifically expressed. The data of this study provide the starting point to further investigation to elucidate the miRNAs regulatory mechanism underlying the biosynthesis of secondary metabolites, especially terpenoids, in X. strumarium glandular cells.
Supporting Information S1  Table. Target prediction of the X. strumarium miRNAs and expression analysis of the targets between the leaves and glandular trichomes of the plant. (XLSX) S6 Table. Target prediction of conserved and novel miRNAs differentially expressed in leaves and glandular trichomes. Table. KEGG pathway analysis of genes for differentially expressed miRNAs. (XLSX) S8 Table. GO enrichment analysis of the targets by the differentially expressedmiRNAs. (XLSX)