Table 1.
Sample identity, location, year, library protocol and number of sequencing reads in millions (M) for historic Atlantic cod samples.
Figure 1.
Frequency of nucleotide substitutions along historic reads of Atlantic cod.
Reads were generated using the TruSeq V2 library creation protocol (a) or the Microplex single tube protocol (see methods) (b). Misalignments to the reference at the 5′ and 3′-end of sequencing reads are the result of elevated proportions of C to T substitutions (red), G to A substitutions (blue) and other possible substitutions (grey). The figure was generated using the program mapDamage V2.0.0 using 1 million randomly chosen reads for merged Illumina and Microplex libraries [13].
Figure 2.
Alignments of reads containing interrupted palindromes to the Atlantic cod reference genome.
The entire reverse complement sequence (Rev.Comp.Read) is displayed underneath the original read. Interrupted palindromes occur at read ends (underlined), and extensive misalignments (red) to the reference occur most proximate to the 3′-end of the read. For display purposes, alignments that did not fit a single line are clipped in the middle of the sequence (indicated with dots, not to scale). The relative start and end position of the alignments are shown above the reference sequence (grey numbers).
Figure 3.
Length distribution of interrupted palindromes at 5′ and 3′-ends in Illumina HiSeq 2000 reads of Atlantic cod (Gadus morhua).
Reads were generated from 11 historic samples using TruSeq library creation protocols (red lines), four historic samples using Microplex protocols (black lines) and one modern sample using TruSeq protocols (grey line). Terminal palindromic sequences longer than three basepair are rare in the Microplex and modern samples.
Figure 4.
Proportion of reads aligning to the Atlantic cod genome for TruSeq and Microplex libraries.
The proportions of reads aligning (relative to the number of untrimmed read pairs) were calculated for libraries including interrupted palindromes (light grey) and those for which these palindromes (dark gray) were removed at the 3′-end. Only reads with a minimum mapping quality (MapQ) value of 25 were considered.
Figure 5.
Hypothetical process creating interrupted palindromes during library creation for next generation sequencing.
Single stranded DNA forms a hairpin loop through the presence of short, naturally occurring reverse complement sequences (a). Exonuclease activity removes unannealed 3′-ends if present, creating a 5′ overhang (b). Polymerases extend the 3′ strand based on the 5′-end of the same strand and create an A-overhang (red, c). Forked Illumina adapters, P5 (blue) and P7 (grey) are ligated to the double stranded stem, using AT-overhang ligation (d). The denatured, ligated construct is suitable for amplification by PCR, and the artificially extended sequence results in a reverse complement artifact that covers both ends of the strand (e).
Table 2.
The abundance of interrupted palindromes in sequencing reads from ancient and historic specimens.