Reader Comments
Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Thank You!
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.
closeSporadic missing data vs ungenotyped markers
Posted by brian_browning1 on 14 Nov 2008 at 02:43 GMT
There is a big difference between 1) imputing sporadic missing genotypes in a sample and 2) imputing ungenotyped markers in a sample using a reference panel. The missing data patterns are completely different for these two cases. Beagle 2.1.3 infers haplotype phase and imputes sporadic missing data, but Beagle 2.1.3 is not designed for imputing data for ungenotyped markers. For example, Beagle 2.1.3 assumes data is unphased when imputing missing data, and version 2.1.3 cannot use known phase in a reference panel. However, a new version of Beagle (version 3.0) was released after this article was published, and the new version is designed to impute ungenotyped markers using a reference panel.
RE: Sporadic missing data vs ungenotyped markers
peiy replied to brian_browning1 on 20 Nov 2008 at 03:47 GMT
Under the assumption that test and reference samples are from the same population, the most noteable difference between the two contexts of missing data seems to be the genotype missing fraction, which is high for ungenotyped missing data and is relatively low for sporadic missing data due to genotyping failture. Our study design was to give a comprehensive comparison among methods under a variety of scenarios. At the time we performed the analyses, the latest version of Beagle was v 2.1.3, it can't use
known phase in a reference panel. To obtain a balance between the availability and the effectiveness of the methods, we combine reference and test sample together so that Beagle v 2.3.1 can use reference sample information to impute ungenotyped markers in test samples. We keep in mind that such combination is unfair in terms of accuracy rate to Beagle v 2.1.3. For Beagle v 3.0 with more specific modelling, further endeavor is expected to study its performance relative to others.