Identification and Characterization of MicroRNAs in Asiatic Cotton (Gossypium arboreum L.)

To date, no miRNAs have been identified in the important diploid cotton species although there are several reports on miRNAs in upland cotton. In this study, we identified 73 miRNAs, belonging to 49 families, from Asiatic cotton using a well-developed comparative genome-based homologue search. Several of the predicted miRNAs were validated using quantitative real time PCR (qRT-PCR). The length of miRNAs varied from 18 to 22 nt with an average of 20 nt. The length of miRNA precursors also varied from 46 to 684 nt with an average of 138 ±120 nt. For a majority of Asiatic cotton miRNAs, there is only one member per family; however, multiple members were identified for miRNA 156, 414, 837, 838, 1044, 1533, 2902, 2868, 5021 and 5142 families. Nucleotides A and U were dominant, accounted for 62.95%, in the Asiatic cotton pre-miRNAs. The Asiatic cotton pre-miRNAs had high negative minimal folding free energy (MFE) and adjusted MFE (AMFE) and high MFE index (MFEI). Many miRNAs identified in Asiatic cotton suggest that miRNAs also play a similar regulatory mechanism in diploid cotton.


Introduction
Cotton is an important economic crop due to its role in providing a source of fiber, textile, and oils. Besides the fact that cotton is the most important fiber crop; cotton is also the second most important resource for plant protein and the fifth most important oil crop. Cotton seeds contain about 23% proteins, and it is estimated that cotton seeds could potentially provide the protein required by about half billion people annually. Because cotton seeds are rich in oil content (approximately 21%), it has become the fifth most important plant oil resource in the world.. Although cotton is an important crop, scientific research and genetic knowledge of cotton is much behind other crops, such as rice, maize, wheat and soybean.
The genus Gossypium is a large family, which consists of approximately 50 species originating from different geographic areas, majorly from Australia and Latin America. However, only four of them are domesticated and produce valuable spinnable fiber. The four domesticated species are Upland cotton (Gossypium hirsutum L.), Island cotton (G. barbadense L.), African cotton (G. herbaceum L.) and Asiatic cotton (G. arboreum L.). The Upland cotton and Island cotton are allotetraploids (AADD, 2n = 4x = 52); African cotton (A1A1) and Asiatic cotton (A2A2) are diploids (2n = 2x = 26). Because tetraploid cotton (Upland Island cotton) have much higher fiber yield as well longer fibers than the diploid cotton ( African and Asiatic cotton), tetraploid cotton, particularly the Upland cotton, is currently widely cultivated around the world. Despite their poor yield and fiber quality, the diploid species have many other attractive agronomic traits [1]. For example, several investigations demonstrated that the Asiatic cotton was tolerant and even resistant to several environmental biotic stresses, such as resistance to several pests and diseases, including bollworms [2], aphids [3], leafhopper [3], rust, fungal [4], and viral [5] diseases. Asiatic cotton also contains traits tolerant to environmental abiotic stress, including drought and salinity stress [6,7]. Thus, Asiatic cotton easily grows in dry land with low input cultivation practices. Many Asiatic cotton cultivars produce fibers with high strength and some produc seeds with high oil content. All of these traits make Asiatic cotton a valuable resource of genetic germplasm or gene pool for modern cotton cultivar improvement and the cotton industry [1]. Additionally, diploid cotton also has an advantage over tetraploid cotton in elucidating gene structure and function because their genome is much more simple as compared to the tetraploid species. Unfortunately, very few studies have focused on diploid cotton, particularly on the molecular level. Understandinggene function and their regulatory mechanisms could provide new insight into the mechanisms of cotton growth and development, particularly on cotton fiber initiation and development and the response to environmental abiotic and biotic stresses. The expected results of this study will facilitate efficient usage of the diploid cotton species for developing improved cotton cultivars with favorable fiber yield and other key agronomic traits. microRNAs (miRNAs) are a newly identified class of extensively endogenous small RNA molecules and are approximately 20-22 nucleotides in length. Increasing amounts of evidence shows that miRNAs play an essential role in almost all biological and metabolic processes, such as plant organ differentiation and development, signal transduction, phase change, and response to environmental biotic and abiotic stress [8]. For example, miR 172 controls flower development and phase change from vegetative growth to reproductive growth though targeting the apetala 2 gene  [9,10,11,12]. miRNAs are different from other RNAs, including small interfering RNAs (siRNAs). The important features of miRNAs include the following: 1) all miRNAs are coded by miRNA genes with unknown length; a miRNA gene is first transcribed into a long primary RNAs, called primary miRNA (pri-miRNAs). A pri-miRNA is sequentially cleaved into a short stem-loop structured miRNA precursor (pre-miRNA) and then to a mature miRNA by an enzyme called dicer-like 1 (dcl 1) with the help of other several other enzymes. 2) All pre-miRNAs can form a stem-looped hairpin secondary structure with the mature miRNA siting on one of the arms. 3) A pre-miRNA forms its secondary structure with high negative minimum folding free energy (MFE) and minimum folding free energy index (MFEI). 4) All miRNAs have their star sequence, the opposite site sequence, termed miRNA*, in the secondary hairpin structure; miRNA and miRNA* sequences are almost complementary to each other with few mismatches; therefore miRNA and miRNA* sequences always form miRNA:miRNA* complexes during miRNA biogenesis. In most cases, only miRNAs function in regulating gene expression and the miRNA* sequence is degraded by an unknown mechanism. However, in some cases, the miRNA* sequence also can function to target a specific gene. More interestingly, many miRNAs are highly evolutionarily conserved from species to species in both animal and plant kingdoms although no clue for that miRNAs were a common ancestor for plant and animal miRNAs. This conservation ranges a long distance, such as from worms to humans in animals [13,14,15] and from moss and ferns to high flowering plants in plants [16]. This provides a powerful strategy to identify miRNAs in a new plant species using already known miRNAs in a known plant species through the use of a comparative genome-based homologue search. However, conservation is not the only criteria used for identifying miRNAs using a homologue-based Blastn search because miRNAs are short and it is difficult to distinguish a miRNA sequence from their targeted gene sequences. When using homologue-based Blastn searches for miRNAs, other criteria, such as their secondary hairpin stem-loop structure must also be considered. Since comparative genome-based homologue searcesh were employed to identify plant miRNAs in the middle of 2000s [17], this strategy has been widely adopted by many research laboratories [18]. Based on a rough approximation, using this method, thousands of miRNAs have been systematically identified in more than 30 plant species, including many important plant and agricultural crop species, such as soybean [19,20], maize [21], tomato [22], tobacco [23], potato [24,25], wheat [26], Brassica napus [27], citrus [28], switchgrass [29] and apple [30].
According to miRBase, a public miRNA database (Release 18, November 2011), there are a total of 4,014 plant miRNAs that have been currently identified and deposited into this database [31]. These miRNAs are obtained from 52 plant species, ranging from ferns and mosses to high flowering plants. However, no study has been reported in Asiatic cotton, one of the most important cotton species. In this study, we employed a comparative genomebased homologue search to identify miRNAs in Asiatic cotton using current Asiatic cotton expressed sequence tags (ESTs) available in the NCBI Genbank database. Based on our findings, 73 miRNAs, belonging to 49 families have been identified for the first time in Asiatic cotton. A majority of these miRNAs also play important roles in other plant species. We will also report the major features of these newly identified Asiatic cotton miRNAs.

Reference Set of miRNAs
Currently, a total of 4,014 miRNAs from 52 plant species have been deposited in miRBase, a public available miRNA database. These miRNAs are defined as a reference set of miRNA sequences for identifying potential miRNAs in Asiatic cotton. All currently available mature miRNAs and their precursor sequences were downloaded from miRBase (http://www.mirbase.org/; Release 18, November 2011). After the miRNA sequences were downloaded, mature miRNA sequences were Blasted against each other to remove identical sequences. Only unique sequences were kept for later Blastn seaches for identifying Asiatic cotton miRNAs.

Asiatic Cotton ESTs, cDNAs, and mRNAs
Asiatic cotton expressed sequence tags (ESTs), cDNAs, and mRNAs were downloaded from the GenBank nucleotide databases at the National Center for Biotechnology Information (NCBI). Currently, a total of 41,781 Asiatic cotton ESTs are available in the NCBI EST database (dbEST release 120111, December 1, 2011; http://www.ncbi.nlm.nih.gov/dbEST/ dbEST_summary.html).

Identifying Potential Asiatic Cotton miRNAs Using a Comparative Genome-based Homologue Search and Secondary Structure Analysis
A comparative genome-based homologue search is a powerful approach for identifying conserved miRNAs in both plants and animals. Since it was developed in 2005, this method has been widely employed to identify miRNAs in various plant species and also in animals. In this study, we employed the previously published method to identify Asiatic cotton miRNAs from currently available EST sequences. A comparative genome-based homologue search is based on two key parameters: conservation of miRNA sequences and the second stem-loop hairpin structure of the potential pre-miRNAs. Both factors should be strictly considered during identifying conserved miRNAs. Because only mature miRNAs are highly evolutionary conserved in plants for the majority of miRNAs, only mature miRNA sequences were used for the Blast search in this study. The mature sequences of all currently available plant miRNAs were subjected to a Blastn search against all of the currently available Asiatic cotton EST sequences. To identify all potential Asiatic cotton miRNAs, the following Blast parameters were used according to the previous studies: 1) the default word-match size was set at seven, the smallest setting that can be used for the online Blastn program between the miRNA query and the EST sequences; 2) the expected values were set at 1,000 to increase the hit chance for more potential sequences; and 3) the sequence number of the Blastn search and the sequence alignments were set to 1,000. More likely, for the majority of the Blastn search, only partial miRNA sequences were matched to the EST sequences. If this occured, the non-aligned regions of the miRNA query were manually inspected and aligned to the hit EST sequence; the number of mismatched nucleotides was recorded for determining whether the EST sequences contained a potential miRNA sequence according to the criteria described in the later section.
All ESTs with no more than 3 mismatches were selected for further investigation. First, the selected ESTs were Blasted against each other to remove the repeated sequences. Then, the secondary structures of the remaining EST sequences were predicted using a web-based computational program, mfold (http://mfold.rna. albany.edu/?q = mfold/RNA-Folding-Form). All mfold outputs were recorded into an excel file, which included the number of each nucleotide (A, G, C and U), the number of arms per structure, location of the matching regions, and minimal folding free energy (MFE, DG kcal/mol). Then, the adjust minimal folding free energy (AMFE) and the minimal folding free energy index (MFEI) were calculated according to the previous report [32]. AMFE means the MFE of a RNA sequence with 100 nt in length, which is equal to MFE/(length of a potential pre-miRNA) * 100. MFEI is equal to MFE/(length of a potential pre-miRNA)/ (The percentage of nucleotides G and C) [32].
An EST was considered a miRNA candidate when it fit all of the following criteria: 1) the predicted mature miRNAs had no more than three nucleotide substitutions compared with a known mature miRNAs; 2) the EST sequence could fold into an appropriate stem-loop hairpin secondary structure; 3) the mature miRNA was localized in one arm of the stem-loop structure; 4) there was no loop or break in the miRNA or miRNA* sequences; 5) there were no more than 6 mismatches between the predicted mature miRNA sequence and its opposite miRNA* sequence in the secondary structure; and 6) the predicted secondary structure had high negative MFE and high MFEI value.

Validation of Asiatic Cotton miRNAs
The predicted Asiatic cotton miRNAs were validated using stem-loop RT-PCR according to the previous report [33]. Total RNA was isolated from 10-day-old seedlings of Asiatic cotton following a previous report. After measuring the concentraion and quality of the RNAs, 200 ng of total RNA was used to reverse transcribe a miRNA into cDNA using a miRNA-specific stem-loop primer and the TaqManH microRNA Reverse Transcription kit (Applied Biosystems) according to the manufacturer's protocol.
Quantitative real time PCR (qRT-PCR) was performed on an Applied Biosystems 7300 Sequence Detection System. For each reaction, 20 mL PCR reaction mixtures were prepared and each contained 2 mL of RT product from the reverse-transcription reaction (after tenfold dilution), 2 mL of miRNA-specific primers, 10 mL Universal Syber Green PCR Master Mix, and 6 mL nuclease-free water. The reactions were incubated in a 96-well plate at 95uC for 10 min, followed by 45 cycles of 95uC for 15 s and 60uC for 60 s. After the completion of the real-time reactions, the threshold was manually set and the threshold cycle (C T ) was recorded. All reactions were conducted in triplicate.

Identification of miRNAs in Asiatic Cotton
After carefully considering the Blastn results and the secondary structure of the sequences, we identified 73 potential miRNAs from the 41,781 currently available Asiatic cotton ESTs. These 73 miRNAs belong to 49 families (Table 1, Figure 1 and Table S1). A large number of miRNAs identified in Asiatic cotton suggests that miRNAs also widely exist in diploid cotton species and play important roles in Asiatic cotton growth and development based on the evidences in other plant species.
Although miRNA mature sequences are highly evolutionarily conserved from species to species, in this study, we found that a majority of identified mature Asiatic cotton miRNAs had 2 or 3 nucleotide changes as compared with miRNAs in other plant species. This may be caused by two reasons: 1) due to the limited EST sequences available in the database, we only identified partial miRNAs identified in other plant species. For many high evolutionally conserved miRNAs, such as miR 162 and miR 164, we did not find those in this study. As more EST sequences become available in the database, it is possible that we could identify more miRNAs that are identical with currently known miRNAs. 2) Although miRNAs are highly conserved, it does not mean the miRNAs are identical among different species and there may be mutantions in the evolution of miRNA biogenesis.

Characterization of miRNAs in Asiatic Cotton
The identified miRNAs were not evenly distributed in each miRNA family (Table 1). For a majority of the miRNA families identified, there was only one member identified in this study. However, for some miRNA families (miRNA 156, 414, 837, 838, 1044, 1533, 2902, 2868, 5021 and 5142), multiple members were identified. For example, there are four members identified in the miR156 family. Although this may be caused by the uneven EST sequences, it also suggests the potential miRNA patterns in Asiatic cotton. miR156 is a large family in other plant species, such as the model plant species Arabidopsis, which plays an important role in leaf development.
A majority of identified miRNAs were obtained from the plus strand. However, there were several miRNAs identified from the minus strand of certain ESTs. The mature miRNA sequences could be located within either the 3' or 5' arm of the secondary stem-loop hairpin structures. Among the 73 identified miRNAs, 38 were located within the 3' arm and the other 35 miRNAs were located with the 5' arm.
The length of mature miRNAs varied from 18 to 22 nt with an average of 19.961.0 nucleotides (Figure 2). A majority of mature miRNAs have 19-21 nt, which counts for 86.30% of the total identified miRNAs. The length range of the Asiatic cotton mature miRNAs is similar to those of other plant species.
The length of identified miRNA precursor sequences was also varied from miRNA to miRNA. The range of pre-miRNA lengths is 46 to 684 nt with an average of 1386120 nt ( Figure 3). However, a large percentage of pre-miRNAs (32.9%) had 50-70 nt. There were only 10 miRNAs (13.7%) with more than 210 nt in length.
The composition of the four nucleotides (A, G, C, and U) is an important parameter, which is an indicator for species evolution as well as for the stabilization of one specific RNA sequence cased by their secondary structure. The percentage composition of each nucleotide was not evenly distributed in the identified Asiatic cotton pre-miRNAs (Table 2). With an unknown reason, the nucleotide uracil (U) is dominant in both mature miRNAs and pre-miRNAs in both plant and animal miRNAs. In this study, we observed that the U content varied from 17.7% to 57.7% with an average of 33.167.9% in the identified Asiatic cotton pre-miRNAs   Figure 4); which is significantly higher than the content of other nucleotides, particularly much higher than nucleotides C (15.466.8%) and G (21.768.0%). A majority (83.7%) of pre-miRNAs contained more than 25% of the nucleotide U (Figure 4).
Nucleotides G and C contributes to the formation and stabilization of the secondary structure of stem-loop hairpins because G and C will form three hydrogen bonds with each other but A and U only form two hydrogen bonds with each other. Generally speaking, the more GC content a sequence contains, the more stable the secondary structure of a specific RNA will be. In the identified Asiatic cotton pre-miRNAs, the GC content (37.1611.8%) was much lower than the AU content (63.0611.8%) ( Table 2, Figures 5 and 6). More than half of the total number of nucleotides were A or U. In this study, we also calculated the A/U and C/G ratio; the ratios of A/U and C/G were 0.94 and 0.76, respectively. This suggests that more of the nucleotide G existed in the pre-miRNA sequences than C however the number of A and U nucleotiudes was almost equal.
One criterion for measuring the stability of a RNA or DNA secondary structure is the minimal folding free energy (MFE). Usually, the lower the MFE, the more stable the RNA or DNA molecule. The MFE of the 73 identified Asiatic cotton pre-miRNAs varied from -3.2 to -232.2 kcal/mol with an average of - 35.19635.12 kcal/mol (Figure 7). For a majority of Asiatic cotton pre-miRNAs, their MFE are ranged from -10 to -30 kcal/mol. One reason causing the large variance in MFE is the significance in their length. It is also an issue for determining the stability of a RNA or DNA using MFE because different RNA or DNA strands contains a different number of nucleotides. To better measure the stability of RNA or DNA strands, the adjusted minimal folding free energy (AMFE) strategy was developed, which is the MFE of a RNA/DNA sequence that is 100 nt in length. The AMFE of the 73 identified Asiatic cotton pre-miRNAs ranged from -6.15 to -51.86 kcal/mol with an average of -24.92 69.21 kcal/mol, which is a smaller range as compared with the MFE range. About half of the pre-miRNAs have AMFE values with -20 to -30 kcal/mol ( Figure 8).
The minimal folding free energy index (MFEI) is a new criterion for assaying miRNAs and distinguishing miRNAs from other    coding and non-coding RNAs. The MFEI of the identified Asiatic cotton pre-miRNAs ranged from 0.29 to 1.85 with an average of 0.7060.25 ( Figure 9).

Validation of miRNAs in Asiatic Cotton
Quantitative real-time PCR (qRT-PCR) is a reliable method to determine the expression of specific miRNAs [33]. In this study, we employed qRT-PCR to detect the expression of three predicted miRNAs (miR 156, miR 172 and miR 399). The results show that all of the three predicted miRNAs exist and are highly expressed in Asiatic cotton seedlings.

Discussion
Although miRNA-related research is one of the hottest research topics in biological or biomedicinal fields [34,35], miRNA research on cotton is much behind other plant species. Up to now, only several research projects have reported on miRNA identification and expression analysis in cotton [36,37,38,39,40]. These research projects employed computational approaches as well deep sequencing technology to identify miRNAs from cotton and to investigate their expression profile using qRT-PCR [37] and deep sequencing [40]. However, all of these studies are focused on upland cotton. Based on our best knowledge, no research has been performed on diploid cotton. In this study, our focuswas on an important diploid cotton species, Asiatic cotton, and we were able to successfully identify 73 miRNAs, belonging to 49 families, using an in silico comparative genome-based approach. Many of these identified miRNAs play an important role in model plant species. For example, miR 172 controls floral development and phase switch to reproductive growth. miR 399 is a responsive miRNA to environmental stress. We believe these miRNAs also play important roles in Asiatic cotton growth, development, and response to environmental abiotic and biotic stresses.
Newly identified Asiatic cotton miRNAs have similar characteristics to the miRNAs in other plant species [32]. These major parameters include the length of mature miRNAs and miRNA precursors, the nucleotide composition, and the minimal folding free energy (Table 2). This suggests that Asiatic cotton miRNAs have a common ancestor with other plant species and that they share the same regulatory mechanisms.