Figure 1.
Schematic drawing of imputation Scenario A.
In this drawing, haplotypes are represented as horizontal boxes containing 0's and 1's (for alternate SNP alleles), and unphased genotypes are represented as rows of 0's, 1's, 2's, and ?'s (where ‘1’ is the heterozygous state and ‘?’ denotes a missing genotype). The SNPs (columns) in the dataset can be partitioned into two disjoint sets: a set T (blue) that is genotyped in all individuals and a set U (green) that is genotyped only in the haploid reference panel. The goal of imputation in this scenario is to estimate the genotypes of SNPs in set U in the study sample.
Figure 2.
Schematic drawing of imputation Scenario B.
In this drawing, haplotypes are represented as horizontal boxes containing 0's and 1's (for alternate SNP alleles), and unphased genotypes are represented as rows of 0's, 1's, 2's, and ?'s (where ‘1’ is the heterozygous state and ‘?’ denotes a missing genotype). The SNPs (columns) in the dataset can be partitioned into three disjoint sets: a set T (blue) that is genotyped in all individuals, a set U2 (yellow) that is genotyped in both the haploid and diploid reference panels but not the study sample, and a set U1 (green) that is genotyped only in the haploid reference panel. The goal of imputation in this scenario is to estimate the genotypes of SNPs in set U2 in the study sample and SNPs in the set U1 in both the study sample and, if desired, the diploid reference panel.
Figure 3.
Percentage discordance versus percentage missing genotypes for Scenario A dataset.
(A) Full range of results, corresponding to calling thresholds from 0.33 to 0.99. (B) Magnified results for calling thresholds near 0.99. (C) Magnified results for calling thresholds near 0.33.
Table 1.
Running times and memory requirements for various algorithms in Scenario A.
Figure 4.
Percentage discordance versus percentage missing genotypes for restricted Scenario B dataset.
(A) Results for masked Illumina genotypes imputed from Affymetrix genotypes in the study sample. (B) Results for masked Affymetrix genotypes imputed from Illumina genotypes in the study sample. (C) Results for masked Illumina genotypes (SNPs with MAF<5% only) imputed from Affymetrix genotypes in the study sample. (D) Results for masked Affymetrix genotypes (SNPs with MAF<5% only) imputed from Illumina genotypes in the study sample.
Figure 5.
Percentage discordance versus percentage missing genotypes for full Scenario B dataset.
(A) Results for masked Illumina genotypes imputed from Affymetrix genotypes in the study sample. (B) Results for masked Affymetrix genotypes imputed from Illumina genotypes in the study sample. Solid lines were obtained from the restricted Scenario B dataset (Figure 4) and are shown for reference; dashed lines were obtained from the full Scenario B dataset.
Table 2.
False negative (FN) and false positive (FP) minor allele call rates at rare SNPs (MAF<5%) in Scenario B.
Table 3.
Running times and memory requirements for various algorithms in Scenario B.