Figure 1.
Schematic Diagram of the Principal Steps in the Analysis of Sequencing Variants Found by SNPdetector
Paralellograms are analytical modules (usually C programs), and rectangles are input and output data. Programs obtained from the public domain are displayed in italics while those developed in this work are shown in bold. SNPdetector requires the following three sets of input data: (1) a template sequence file, (2) the forward and the reverse sequencing primers, and (3) the trace files. The output includes a list of high-quality SNPs and their genotype calls in each subject.
Table 1.
Comparison of the Results Obtained by SNPdetector and PolyPhred (Version 5.0.2) in Mouse Resequencing
Table 2.
Heterozygosity at the Bach1 Locus in Animals of Wild-Derived Inbred Strain CAST/Ei
Table 3.
Genotyping of Candidate SNPs Identified by SNPdetector in Human ENCODE Regions
Table 4.
Comparison of SNPdetector with PolyPhred 5.0.2 and NovoSNP on a Subset of ENCODE Data
Table 5.
Sequence Coverage Analysis of the Three Datasets
Figure 2.
Rejected and Accepted Bases in a Sequence Trace
The Phred quality scores are indicated at the top. The quality scores for rejected bases are labeled in red. Accepted bases are marked by rectangular boxes.
(A) A subregion of polyA bubble showing that low-quality bases with no secondary peaks are accepted by SNPdetector.
(B) A subregion showing that a Q20 base is rejected because of its high secondary peak even though the majority of neighboring bases have high-quality scores.
Figure 3.
A PolyA Bubble That Occurs in Multiple Samples
The bubble was recognized as a sequencing artifact by SNPdetector, and no SNP was called even though the alternative adenine residue (in the highlighted column) appeared in two samples with an average Phred quality score of 20. In addition, all three traces in this region have a polyG spill at the right, with a secondary guanine peak spanning four residues; and a polyT spill at the left, with a secondary thymine peak spanning three residues.
Figure 4.
Sequence Traces of a SNP Cluster with Three Consecutive SNPs
The top is a homozygous sample and the bottom a heterozygous one. The Phred quality score is labeled on top of each base. In the heterozygous sample, the three HQDPs around the three heterozygotes are labeled with red lines at the bottom. The flanking bases used for calculating genotype quality class of the highlighted heterozygote in the middle are marked by rectangular boxes, which do not include any HQDPs. The flanking bases used to assess background noise in the flanking region are labeled with brackets at the bottom.
Table 6.
Parameters Used in Horizontal Scan