Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation

doi:10.1371/journal.pcbi.1002604

Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation

Figure 2

Sensitivity and specificity of data collection strategies.

For different combinations of array and sequence data, we produced joint genotype calls on chromosome 20 for 382 European samples from the 1000G project. For a single test sample, we obtained “gold-standard” genotypes from high coverage multi-technology sequencing published by the 1000G project. We then measured non-reference site sensitivity and specificity with imputation (, ) and without (, ). (a) (left) and (right) of calls from five array densities and four sequence coverages. The first row of each table contains results for strategies with only sequence data, and the first column contains results for strategies with only array data. A common color scheme is used across all tables, with white corresponding to 100%, red corresponding to , and yellow corresponding to 80%. (b) of calls; is given in Figure S9. (c) for three variant frequency ranges, with frequency estimated from the non-test samples. Private variants have frequency 0% in the non-test samples. (d) for four sequence coverages, with separate lines that correspond to joint calls made with each SNP array. (e) for four array densities, with separate lines that correspond to joint calls made with each sequence coverage. No Array: from sequence data alone; 0×: from array data alone; .5×-4×: mean number of sequence reads per genomic position; array abbreviations are defined in Materials and Methods.

doi: https://doi.org/10.1371/journal.pcbi.1002604.g002