Reader Comments

Post a new comment on this article

Use of coded primers requires minimum difference to be effective

Posted by shuse on 04 May 2007 at 13:57 GMT

My colleagues and I at the Marine Biological Laboratory in Woods Hole have found the use coded PCR primers in massively-parallel sequencing to be very effective, and we routinely use this method in all of our runs. The process has many advantages as outlined in “The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing” by Binladen et al (2007), including the ability to combine many data sources together and to separate them out bioinformatically rather than mechanically.

We would like to offer a caution, however, to those using this method. We strongly recommend selecting coding primer keys, what we have been calling “run keys”, that differ by AT LEAST 2 nucleotides. If two run keys have only one nucleotide difference, it would take only one substitution in the key for a read to “jump” from one experiment to the next. Additionally, when selecting run keys, it is helpful to calculate the number of nucleotide flows required to accurately synthesize the key and preferentially select those keys using the least number of flows. We have also chosen to avoid homopolymers within keys to increase their sequencing accuracy.

Although we have found the per-nucleotide accuracy of the GS20 to be quite high (>99%), in several hundred thousand reads of a full run, the number of synthesis errors becomes quite noticeable. For instance, with the mis-assignment rate of 0.0036 reported by Binladen et al, assuming two run keys differing by only one nucleotide, the expected number of sequences incorrectly assigned in a set of 300,000 would be 1063. With two run keys differing by two nucleotides, the number of sequences mis-assigned drops to 4. We have chosen to use pentanucleotide keys as a means of maximizing the number of keys we can use that differ by at least 2 nucleotides, and usually more, without losing too much of the limited sequence length to primer. We analyzed 389,422 reads comparing pentanucleotide keys and known sections of sequence (amplification primers) and found no mis-assigned reads.

An important advantage of the method that was not mentioned in the paper is that the use of these run keys allows us to reduce the number of well plate regions. The GS20 uses a gasket to mask sections of the plate to create individual regions. The more regions used, the more gasket necessary, the more wells physically covered by the gasket and the fewer wells available for sequencing. By using a two-region gasket rather than a 4- or 8-region gasket we can increase the number of reads in a single run and consequently decrease the per-read cost. The method has also allowed us to piggy-back smaller projects with very limited budgets onto larger projects, at a fraction of the cost of a full run.

Careful rotation of these keys across sequencing runs can also help to filter out contamination within and between runs. We have seen contamination between runs documented not only in this paper where previous waterbuck sequences appeared in the current experiment, but also in other papers and in data sequenced for us commercially. By only selecting only sequences with the proper run key for a given experiment, this contamination can be filtered out. This same filtering process will remove contamination across well plate regions of the same run.

Overall, we highly recommend this methodology as a means of greatly improving the utility of the massively parallel pyrosequencing system through improvements in both the number of parallel experiments and the number of sequence reads within a single pyrosequencing run. However, we strongly urge that only keys differing by two or more nucleotides be used.