Linkage Disequilibrium-Based Quality Control for Large-Scale Genetic Studies
Each plot contains a box corresponding to the number of observed MIs or discrepancies (horizontal axis). The position of the bottom and top of a box relates the first and third quartiles of the estimated number of MIs or discrepancies (vertical axis), with the median displayed as a horizontal line in the middle of each box. The red dotted line indicates equality between the number of estimated errors and observed MIs or discrepancies. First row (A-B): The total number of expected errors at each SNP, based on LD, was calculated for the HapMap data and plotted against the number of MIs. Second row (C-E): The total number of expected errors at each SNP, based on LD, was calculated for the Affymetrix data, and plotted against the number of discrepancies between the Affymetrix and HapMap genotype calls. In general, the median and the upper quartile for the number of estimated errors increase with the number of discrepancies/MIs. The fact that the lower quartile is at 0 in (C-E), even for SNPs with many discrepancies, could partially reflect the existence of SNPs with many discrepancies, but with few errors in the Affymetrix data (the discrepancies being due to errors in the HapMap data).