Figure 1.
Analysis process flow diagram.
This analysis process mainly consisted of four steps: (1) evaluation of raw and HQ read assembly, with HQ reads chosen as input for the next step; (2) evaluation of the performance of four assembler programs using SK and MK methods, with Trinity selected for downstream analysis; (3) evaluation of different data volume assembly strategy; and (4) evaluation of two hybrid assembly strategies; (5) evaluation of two combined assembly strategy.
Table 1.
Statistics of raw and HQ reads.
Table 2.
Statistics of the Mix raw and HQ read assemblies.
Figure 2.
Comparison of raw and HQ read assembly by mapping reads back to assembled contigs.
Raw and HQ reads were mapped back to the raw readand HQ read assemblies (contigs >300 bp), respectively. The assessment metrics included ACE1 and AC>1 percentagesandOAR. Statistical analysis was performed using paired t-tests and P-valuesrefer todifferences between the two assemblies.
Table 3.
Quality of assemblies mapped to wheat reference genes.
Figure 3.
Performance of three programs using the SK method.
The statistics of HQ read assembly by SOAPdenovo (red), Velvet (blue) and ABySS (green) with different single k-mer lengths (x axis). Assembly statistics metrics included the average contig length (squares), the N50 length (triangles), total sequences (circles) and percentage of Ns (stars).
Figure 4.
Performance of four programs and SK and MK strategies.
Performance measures evaluated included assembly descriptive statistics (A), RMBT percentage (B) and match with wheat reference genes (C). Lower case letters indicate significant differences (at 5% level) among the means for the different programs of the same-colored bars.
Table 4.
Evaluation of quality of assemblies from the four programs by BLAST alignment with wheat reference genes.
Figure 5.
Similarity among the seven assemblies.
Pairwise comparison among seven assemblies. Row and column intersections indicated that the two assemblies were more similar.
Figure 6.
Performance of three assemble strategies.
The performance was evaluated by (A) assembly descriptive statistics, (B) RMBT proportion, and (C) longest runtime and largest RAM usage.
Figure 7.
Distribution of HQ reads mapped to TriallCDNA.
HQ reads from the four libraries were aligned to the TriallCDNA reference set. Shown is the percentage of mapped reads (y axis) vs. copies (x axis).
Table 5.
Statistics of de novo, MUTT, ADTT, CMUTT and CADTT assemblies.