Testing of library preparation methods for transcriptome sequencing of real life glioblastoma and brain tissue specimens: A comparative study with special focus on long non-coding RNAs | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Table 1 — Table 1.

Concentration, purity and RNA integrity number (RIN) of isolated and diluted total RNA samples.
Purity of RNA was evaluated based on absorbance ratios which should be approximately 2.0. Lower A₂₆₀/A₂₃₀ hints at the presence of organic contaminants, for example acidic phenol. RNA with sufficient integrity for all downstream applications is characterized by RIN 8.0 or more.

More »

Fig 1 — Fig 1.

Four examples of electropherograms (EPGs) of total RNA samples analyzed on 2200 TapeStation.
In this figure, x-axis represents the number of base pairs. The position of other peaks is derived from the position of the first peak (lower marker). Peaks 18S a 28S represent ribosomal (rRNA) subunits. Peak in the position of ~80 bp represents short RNA species and the last peak in the position of 15–16 kbp represents genomic DNA (gDNA). A) EPG of GBM-1 contains proportional peaks for 18S and 28S subunits indicating high integrity of the sample and no visible peak in the position of 15–16 kbp signifying no gDNA contamination. B) EPG of GBM-2 indicates low integrity of the RNA sample judging by the severe degradation of 28S and resulting disproportionality of height of both rRNA peaks. Again, gDNA was not observed. C) EPG of NonT-1 depicts 18S and 28S peaks with approximately same height which, again, is the mark of advanced degradation. Similar abundance of gDNA was also observed. D) EPG of NonT-3 shows fairly intact RNA with minor but visible gDNA contamination.

More »

Table 2 — Table 2.

Concentration, modal length and molarity of molecular libraries.
Nine CFG libraries were generated with three different library preparation kits NEXTflex (NF), SENSE (LX) and NEBNext (NB) and eight VB libraries were independently prepared with NEBNext kit. Concentration was measured on Qubit 2.0 and modal lengths were determined using 2200 TapeStation or Fragment Analyzer. Molarity was calculated from these two parameters using an online weight-to-moles conversion calculator [24].

More »

Table 3 — Table 3.

Percentage of duplicates and average GC content.
Calculated for 9 CFG libraries prepared with 3 different library preparation kits NEXTflex (NF), SENSE (LX) and NEBNext (NB) and 8 VB libraries independently prepared with NEBNext kit.

More »

Fig 2 — Fig 2.

Normalized average GC content of reads.
A roughly normal distribution of GC content is typical for normal random libraries. A) NEXTflex (NF) libraries do not have a normal distribution of GC content with sharp peaks present at the position of 55% indicating possible specific contamination, for example adapater dimers, or other bias. B) SENSE (LX) libraries exhibit normal distribution with the exception of library N1_LX based on the presence of a sharp peak around 50%. C) NEBNext (NB) libraries also do not have a normal distribution of GC content judging by the sharp peaks around 60% indicating the same problem as that of the NEXTflex libraries. D) NEBNext (VB) libraries have a roughly normal distribution of the GC content with the most distinct exception being N1_VB.

More »

Fig 3 — Fig 3.

The total amount of over-represented sequences.
The top frequent sequence is in blue. A) The numbers in CFG libraries vary with no over-represented sequences found in two NEXTflex (NF) libraries and all three SENSE (LX) libraries. The highest percentage of these sequences are present in N1_NB. B) VB libraries all contain over-represented sequences to some extent with the highest number belonging to N1_VB.

More »

Fig 4 — Fig 4.

Statistics generated using Gene Counts mode of alignment tool STAR.
The category Overlapping Genes includes sequences which overlap only one gene and together with the categories No Feature and Ambiguous Features comprise all uniquely mapped reads. Reads that were put into the category Multimapping do not map uniquely but to multiple loci, and the rest of the sequences could not be mapped at all, hence the category Unmapped. A) The highest numbers of reads in the category Overlapping Genes belong to all three SENSE (LX) libraries which also contain least reads that map to multiple loci. On the other hand, libraries generated with NEBNext (NB) and especially NEXTflex (NF) exhibit higher multimapping rates and lower numbers of reads overlapping only one gene. B) VB libraries generally show relatively consistent multimapping rates with the exception of N1_VB. The numbers of reads overlapping single genes range between 42.4% to 57.8%.

More »

Fig 5 — Fig 5.

The percentage of reads aligned to particular regions in the reference genome.
A) Among CFG libraries, SENSE (LX) libraries contained sequences which on average mapped more to coding regions and less to UTR compared to the other two kits. The number of bases aligning to intronic and intergenic regions ranged from 9.5 to 21.8% and 3.0 to 6.1%, respectively, with no major differences between the three groups of libraries. The highest numbers of bases aligning to ribosomal regions belonged to libraries G2_NB (11.1%), G2_NF (14.0%) and N1_NB (14.7%). B) VB libraries mapped to particular regions in a similar manner to NB libraries, as was expected, with the higher number of bases mapping to UTR compared to coding regions but with lower amount of base reads aligning to ribosomal regions.

More »

Fig 6 — Fig 6.

Results from comparison of three library preparation kits using the Ensembl automatic annotation system.
The number of hits for each transcript biotype from Ensembl database calculated for each group of libraries generated using the same library preparation kit is on y-axis. Only the most populous biotypes relevant to our current study were included. While the first biotype comprises the majority of the group of protein coding RNA, the rest (lincRNA, antisense and sense intronic RNA) is part of another large group of long non-coding RNA. After comparing all three kits, the highest number of hits for each biotype of interest was observed for NEBNext (NB) libraries; NEXTflex (NF) libraries came second and SENSE (LX) libraries third. These results hint at a lesser variability of library fragments in LX libraries.

More »

Fig 7 — Fig 7.

Normalized transcript coverage.
A) NEXTflex (NF) libraries seemed to show relatively even transcript coverage with a notable peak in the 3' terminal parts of genes (~90–100%). B) Transcript coverage of SENSE (LX) libraries was more irregular hinting at a lesser variability of fragments generated by this particular kit. C) NEBNext (NB) libraries showed most even coverage with a notable peak in the coverage of 3' terminal parts of genes. D) NEBNext (VB) libraries showed even coverage similarly to NB libraries with more pronounced spike in the coverage of 3' terminal parts of genes.

More »

Table 4 — Table 4.

Percentage of ribosomal RNA (rRNA) and duplicate rates.
Calculated via Picard tool for 9 CFG libraries generated with 3 different library preparation kits NEXTflex (NF), SENSE (LX) and NEBNext (NB) and 8 VB libraries independently prepared with NEBNext kit.

More »

Fig 8 — Fig 8.

2D density scatter plots and the distribution of reads per kilobase (RPK) values per gene.
Density scatter plots are relations of duplicate rates in % to expression in reads/kbp. Calculated using dupRadar for NEXTflex (NF) libraries A) G1_NF, B) G2_NF and C) N1_NF, SENSE (LX) libraries D) G1_LX, E) G2_LX and F) N1_LX, and NEBNext (NB) libraries G) G1_NB, H) G2_NB and I) N1_NB. While, duplication rates in NF and NB libraries were relatively low, LX libraries showed higher duplication rates. No skewed distributions of RPK values per gene with unusual amount of lowly expressed genes were observed.

More »

Fig 9 — Fig 9.

2D density scatter plots and the distribution of reads per kilobase (RPK) values per gene.
Density scatter plots are relations of duplicate rates in % to expression in reads/kbp. Calculated using dupRadar for VB libraries A) G1_VB, B) G2_VB, C) G3_VB, D) G4_VB, E) G5_VB, F) N1_VB, G) N2_VB, H) N3_VB. Duplication rates in these libraries were relatively low and comparable to NB libraries prepared with the same library preparation kit NEBNext, as was expected. No apparent skewed distributions of RPK values per gene with unusual amount of lowly expressed genes were observed.

More »

Table 5 — Table 5.

Summary scores of chosen parameters of all three tested library preparation kits.
Scores were assigned in four categories, specifically the percentage of rRNA reads and duplicate reads and the number of hits in Ensembl database for the groups of mRNAs and lncRNAs. The lowest score in each category is denoted by “-” and the highest by “++”. Based on these parameters, the most suitable kit for preparation of libraries for sequencing of lncRNAs, NEBNext, would be chosen and later independently tested in another NGS facility.

More »

Table 6 — Table 6.

Minimum requirements and brief characterization of three tested library preparation kits.
The minimum requirements according to manufacturers' protocols are the quality of pre-depletion RNA and the amount of input rRNA-depleted RNA. The quality of RNA is based either on RNA integrity number (RIN) or the source of RNA, for example formalin-fixed paraffin-embedded (FFPE) tissue blocks. Preceding depletion of rRNA is required for all library prepration procedures.

More »