Figure 1.
Full-length enrichment for library cloning and next generation sequencing (NGS).
Full-length (blue line with 5′ cap) or truncated (short blue line without 5′ cap) mRNAs were reverse transcribed into first-strand cDNA using oligo-dT primers (red arrow). The mRNA:cDNA hybrid was treated with RNase I (scissor) to remove the single-stranded RNA that was not fully extended by the first-strand cDNA, followed by selection for full-length transcripts using Cap-antibody magnetic beads to enrich the full-length mRNA:cDNA. The full-length single-stranded DNA (FLssDNA) was eluted from beads and used for both cDNA library cloning (lower left) and NGS (lower right). For full-length library cloning, a double-stranded adaptor (green) was linked to the 5′ end of ssDNA. Second-strand cDNA synthesis was then carried out, followed by cloning into a vector. For NGS, the full-length enriched ssDNA was fragmented by sonication to target fragments in the range of 200–400 bp, followed by ligation of the double-stranded DNA sequencing adaptor mixture (purple) to 3′ and 5′ ends of ssDNA. To maintain the complexity of the library while enriching the full-length cDNA for NGS, the original polyA mRNA was also fragmented using RNAse III, followed by ligation of the double-stranded RNA sequencing adaptor mixture (brown) to 3′ and 5′ ends of mRNA. After first- and second-strand synthesis, the polyA and capped mRNA and polyA and non-capped mRNA samples were mixed in a 3∶1 ratio and applied to the downstream NGS procedure.
Table 1.
Description of cloned full-length cDNA libraries.
Table 2.
Data from Ion PGM and 454 sequencing (after trimming) and Trinity assembly output.
Figure 2.
Number of grass transcripts mapping to sugarcane contigs.
A, Percentage of total grass transcripts mapping to sugarcane contigs. B, Total grass transcripts mapping to sugarcane contigs (white bars) and total sugarcane contigs mapping to each grass database (black bars). C, Total sugarcane contigs mapping to grasses, Uniprot, and NR databases and total unmatched sugarcane contigs (putative sugarcane-specific transcripts).
Figure 3.
Venn diagram comparing sugarcane transcripts as obtained by RNAseq (blue, this work), SUCEST (green) [51], and those that have been studied using oligoarrays in customized Agilent sugarcane chip (red) [26], [64].
Green and red boxes show the number of transcripts present in the SUCEST data (green) and the Agilent Chip (red) but not in the sugarcane ORFeome (this work).
Figure 4.
Number of contigs expressed by genotype.
Table 3.
Functional annotation of sugarcane full-length cDNA contigs.
Table 4.
Protein prediction of sugarcane contigs.
Figure 5.
Top four categories of genotype-specific contigs based on Phytozome annotations.
Only contigs from leaf samples in each genotype were considered.
Figure 6.
Number of reads identified as natural antisense transcripts (NATs), based on their orientation on alignment to grass genes.
Tissues are indicated as follows: In1, immature internodes; In5, intermediate internodes; L, leaves.
Figure 7.
Number of contigs and percentage of coverage by antisense reads.
Non-overlapping contig coverage by antisense reads was calculated for all contigs showing antisense expression (28,844 contigs).
Figure 8.
Functional annotation of full-length contigs (40,407).
The graph shows the 20 most frequent categories from Phytozome annotation. In total, 5,038 categories were observed (Table S4).
Table 5.
Number of full-length transcripts identified by each analysis.
Figure 9.
Length distribution of sugarcane ORFs, full-length transcripts (FL), and UTRs.
Graphs denote the comparison of sugarcane length distribution (gray bars) of ORFs (A) and full-length transcripts (B) to other grasses (colored lines). Length distribution of 5′ and 3′ UTRs (C, black and white bars, respectively) of sugarcane full-length transcripts is shown as well.
Figure 10.
Summary of ORFeome construction and main results.
The dashed line denotes future directions for sugarcane ORFeome studies. This ORFeome can be used for gene discovery related to a range of traits since over 5,000 different categories (Table S4) have a full-length representative. Differentially expressed alleles in each sample and in the hybrid and their origin from each ancestral genotype can also be analyzed. Several types of polymorphisms and genetic variability can be further investigated. Both genome assembly and annotation can make use of this sugarcane ORFeome dataset to validate and improve results. TF, transcription factor; NAT, natural antisense transcript.