Analyses and Comparison of Accuracy of Different Genotype Imputation Methods

Yu-Fang Pei; Jian Li; Lei Zhang; Christopher J. Papasian; Hong-Wen Deng

doi:10.1371/journal.pone.0003551

Loading metrics

Open Access

Peer-reviewed

Research Article

Analyses and Comparison of Accuracy of Different Genotype Imputation Methods

Yu-Fang Pei,

Affiliations Institute of Molecular Genetics, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, People's Republic of China, School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
⨯
Jian Li,

Affiliation School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
⨯
Lei Zhang,

Affiliations Institute of Molecular Genetics, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, People's Republic of China, School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
⨯
Christopher J. Papasian,

Affiliation School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
⨯
Hong-Wen Deng

* E-mail: hwdeng@mail.xjtu.edu.cn

Affiliations Institute of Molecular Genetics, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, People's Republic of China, School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America, Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University, Changsha, Hunan, People's Republic of China
⨯

Analyses and Comparison of Accuracy of Different Genotype Imputation Methods

Yu-Fang Pei,
Jian Li,
Lei Zhang,
Christopher J. Papasian,
Hong-Wen Deng

Published: October 29, 2008
https://doi.org/10.1371/journal.pone.0003551

Reader Comments

Post a new comment on this article

Sporadic missing data vs ungenotyped markers

Posted by brian_browning1 on 14 Nov 2008 at 02:43 GMT

There is a big difference between 1) imputing sporadic missing genotypes in a sample and 2) imputing ungenotyped markers in a sample using a reference panel. The missing data patterns are completely different for these two cases. Beagle 2.1.3 infers haplotype phase and imputes sporadic missing data, but Beagle 2.1.3 is not designed for imputing data for ungenotyped markers. For example, Beagle 2.1.3 assumes data is unphased when imputing missing data, and version 2.1.3 cannot use known phase in a reference panel. However, a new version of Beagle (version 3.0) was released after this article was published, and the new version is designed to impute ungenotyped markers using a reference panel.

RE: Sporadic missing data vs ungenotyped markers

peiy replied to brian_browning1 on 20 Nov 2008 at 03:47 GMT

Under the assumption that test and reference samples are from the same population, the most noteable difference between the two contexts of missing data seems to be the genotype missing fraction, which is high for ungenotyped missing data and is relatively low for sporadic missing data due to genotyping failture. Our study design was to give a comprehensive comparison among methods under a variety of scenarios. At the time we performed the analyses, the latest version of Beagle was v 2.1.3, it can't use
known phase in a reference panel. To obtain a balance between the availability and the effectiveness of the methods, we combine reference and test sample together so that Beagle v 2.3.1 can use reference sample information to impute ungenotyped markers in test samples. We keep in mind that such combination is unfair in terms of accuracy rate to Beagle v 2.1.3. For Beagle v 3.0 with more specific modelling, further endeavor is expected to study its performance relative to others.

Subject Areas
?

For more information about PLOS Subject Areas, click here.
We want your feedback. Do these Subject Areas make sense for this article? Click the target next to the incorrect Subject Area and let us know. Thanks for your help!

Haplotypes
Is the Subject Area "Haplotypes" applicable to this article?

Thanks for your feedback.
Single nucleotide polymorphisms
Is the Subject Area "Single nucleotide polymorphisms" applicable to this article?

Thanks for your feedback.
Hidden Markov models
Is the Subject Area "Hidden Markov models" applicable to this article?

Thanks for your feedback.
Markov models
Is the Subject Area "Markov models" applicable to this article?

Thanks for your feedback.
Variant genotypes
Is the Subject Area "Variant genotypes" applicable to this article?

Thanks for your feedback.
Simulation and modeling
Is the Subject Area "Simulation and modeling" applicable to this article?

Thanks for your feedback.
Algorithms
Is the Subject Area "Algorithms" applicable to this article?

Thanks for your feedback.
Genotyping
Is the Subject Area "Genotyping" applicable to this article?

Thanks for your feedback.

Analyses and Comparison of Accuracy of Different Genotype Imputation Methods

Analyses and Comparison of Accuracy of Different Genotype Imputation Methods

Reader Comments

Post Your Discussion Comment

Why should this posting be reviewed?

Thank You!

Sporadic missing data vs ungenotyped markers

Posted by brian_browning1 on 14 Nov 2008 at 02:43 GMT

RE: Sporadic missing data vs ungenotyped markers

peiy replied to brian_browning1 on 20 Nov 2008 at 03:47 GMT