Reader Comments

Post a new comment on this article

Sporadic missing data vs ungenotyped markers

Posted by brian_browning1 on 14 Nov 2008 at 02:43 GMT

There is a big difference between 1) imputing sporadic missing genotypes in a sample and 2) imputing ungenotyped markers in a sample using a reference panel. The missing data patterns are completely different for these two cases. Beagle 2.1.3 infers haplotype phase and imputes sporadic missing data, but Beagle 2.1.3 is not designed for imputing data for ungenotyped markers. For example, Beagle 2.1.3 assumes data is unphased when imputing missing data, and version 2.1.3 cannot use known phase in a reference panel. However, a new version of Beagle (version 3.0) was released after this article was published, and the new version is designed to impute ungenotyped markers using a reference panel.

RE: Sporadic missing data vs ungenotyped markers

peiy replied to brian_browning1 on 20 Nov 2008 at 03:47 GMT

Under the assumption that test and reference samples are from the same population, the most noteable difference between the two contexts of missing data seems to be the genotype missing fraction, which is high for ungenotyped missing data and is relatively low for sporadic missing data due to genotyping failture. Our study design was to give a comprehensive comparison among methods under a variety of scenarios. At the time we performed the analyses, the latest version of Beagle was v 2.1.3, it can't use
known phase in a reference panel. To obtain a balance between the availability and the effectiveness of the methods, we combine reference and test sample together so that Beagle v 2.3.1 can use reference sample information to impute ungenotyped markers in test samples. We keep in mind that such combination is unfair in terms of accuracy rate to Beagle v 2.1.3. For Beagle v 3.0 with more specific modelling, further endeavor is expected to study its performance relative to others.