Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Breakdown of Methods for Phasing and Imputation in the Presence of Double Genotype Sharing

Figure 1

Bad MCMC mixing for cases of double genotype sharing.

MaCH and similar approaches implement a Markov-chain Monte Carlo scheme where in each iteration the individual genotype resolutions are updated one by one, by mapping the genotypes. If two individuals contain identical marker genotypes for a longer stretch of markers, the Hidden Markov Model will give the other individual a probability approaching . When no reference haplotypes are provided, all haplotype data is initialized randomly. In this series of panels, individuals and are initialized differently (a). In panel (b), A is updated. With high probability, the existing (random) haplotype resolution from is copied. When is updated (c), is sampled with high probability, replicating the original random data for . In iteration 2, is updated again (d), but again is sampled with high probability. Since any haplotype resolution for will match the genotypes for , there is no pressure to identify a better resolution. The two individuals form a local feedback loop with no true mixing in the Markov chain. Our modified algorithm lowers the probability of sampling from a mirror individual (like the pair of and ), thus allowing haplotypes from other individuals in the dataset to influence the final resolution. Similar cases can also arise with larger groups of individuals than . Those are handled successfully by our remedy, as well.

Figure 1

doi: https://doi.org/10.1371/journal.pone.0060354.g001