Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation
Figure 1
A statistical framework for joint genotype calls.
We developed a statistical framework to jointly estimate sample genotypes from array intensities, sequence reads, and haplotype phasing. The framework first estimates genotype likelihoods independently for each SNP from array and sequence data, given initial parameters for genotype cluster locations and sequence read error rates. It then multiplies the likelihoods for each SNP to obtain joint likelihoods, inputs these to haplotype phasing and imputation, and then uses the output likelihoods to re-cluster intensities for all SNPs. The process is iterated, and upon termination genotype likelihoods are converted to posterior genotype probabilities. The framework can estimate genotypes given only sequence data or array data as well as with or without imputation — many of these special cases are similar in principle to previously described genotyping algorithms (Text S1).