Biases in the SMART-DNA library preparation method associated with genomic poly dA/dT sequences

doi:10.1371/journal.pone.0172769

Fig 1.

Template switching protocol and type of biases in NGS.

(A) Schematic representation of the SMART library preparation method comprised of: poly dT tailing, priming and second strand synthesis, template switching by MMLV-RT, and addition of adapters and PCR amplification. (B) Schematic representation of the potential biases resulting from the different steps of the library preparation method. The effect of the bias may occur in the sequenced fragment (due to PCR and size selection), in the genomic content of the fragment (due to sonication) or in the interface between the fragment and the genomic content (due to the different methods of adapter addition).

More »

Expand

Fig 2.

Base constitution surrounding the 3’ end of the sequenced fragments in SMART based libraries.

(A) Sequence logo representation of the information content of the region surrounding the end of the second read for the forward (left) and reverse (right) strands in a HCT116 sample prepared with the SMART based library protocol (S1 Table). Similar results were obtained with all other SMART based samples. Each sequence logo is based on 1,000 randomly chosen reads. (B) IGV representation of two typical genomic regions (taken from the same SMART based library as in A), in which multiple reads ended at the same position. The red and blue rectangles represent the locations of the second reads in each pair for the forward (left) and the reverse (right) strands respectively. The small bars below the reads represent individual bases which are color coded. Note that immediately after the reads there are tracts of poly dT and poly dA for the forward and reverse strands, respectively. The sequence logos below the IGV tracts represent the information content as in A, for 1,000 randomly chosen genomic regions out of ~300,000 in which at least 5 reads were mapped.

More »

Expand

Fig 3.

Poly dN tract length analysis in a SMART based library.

Graph representing the ratio between the number of reads in the forward strand versus the number of reads in the reverse strand that were adjacent to various lengths of poly N tracts. The different colors represent the four nucleotides. For T, C and G we present the forward/reverse ratio whereas for A the opposite ratio (reverse/forward) is presented.

More »

Expand

Fig 4.

Bias toward Poly dT tracts in SMART-DNA libraries.

The number of reads (per million reads) mapped adjacent to poly dN tracts (≥12) for the forward and reverse strands is reported. These numbers are further normalized for the genomic frequencies of such tracts (see methods). Data is shown for the second read in each pair (A-B), for the first read (D,E,G) and for random locations (shift of 1000 bps, see methods) (C,F). The analysis was done for SMART-DNA based library preparation (A,D), for ligation based method (B,E) and for SMART-RNA library preparation (G). The distance between the read to the poly dN tract was set to be either 1 nucleotide (A-C) or 250 nucleotides (D-G). The data presented is for HCT116 genomic DNA sequenced by us (A, D), obtained from the Aladgem lab (B,E) and for RNA seq libraries (G) downloaded from [17]. Note that the SMART-RNA library was prepared by annealing a primer to poly dA tracts rather than to poly dT tracts as with other SMART based libraries. We tested the statistical significance of the deviation from a distribution that is based on the frequency of occurrences in the genome, using the Chi squared goodness of fit test. The effect sizes (ϕ) (see methods) are shown in each graph. The data presented is for a single library from each type. Similar results were obtained for additional libraries (S3 Fig).

More »

Expand