PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

doi:10.1371/journal.pcbi.1004139

Table 1.

Variant summary.

More »

Expand

Fig 1.

The imputation pipeline.

Given a pedigree tree of 3,671 Hutterites (1), 1,415 individuals in the three most recent generations (within the red box) were genotyped with framework markers (2). The first part of the pipeline (steps 2–6) depends only on the framework marker data; the second part (steps 7–9) imputes the whole genome sequence variants. First, estimates of identity coefficients and the transition rate parameter λ [24] between each pair of the 1,415 individuals are calculated (3). The framework genotypes are then phased (4), IBD segments between haplotypes are identified using a HMM (5), and indexed into an efficient data structure consisting of IBD cliques (6). Haplotypes are assigned parental origins consistent across the pedigree using the cliques (7). Then, the whole genome sequences of 98 Hutterites (8) are cleaned using several filters, including a novel generalized Mendelian error check (9), and imputed to the remaining 1,317 Hutterites using IBD cliques (10). Call rates are boosted by imputing as many of remaining genotypes as possible using an LD-based imputation method, IMPUTE2 (11). To ensure that accuracy is not compromised, we calculate the concordance of the shared genotypes between the two methods and keep only variants that are highly concordant (12).

More »

Expand

Fig 2.

Partitioning an IBD-sharing graph into cliques.

(1) IBD segments are indexed into a graph at each SNV. Nodes represent haplotypes (denoted A-H). Each pair of haplotypes that share an IBD segment at the SNV is connected with a link whose weight equals the HMM posterior probability. (2) Link weights are replaced by affinities. Links with small original weight or affinity are removed (3); all nodes within each of the resulting connected components are connected (4).

More »

Expand

Fig 3.

Parental origin assignment process.

For a given quasi-founder, we denote his/her haplotypes by A and B, and (by convention) the first is paternal and the second is maternal. At each SNV, we calculate a 2×2 matrix of kinships (Step 1) between each of the proband’s parents and each subject in the A and B IBD cliques. Using these, we generate a parental haplotype separation measure m (Step 2). If m≈1, A and B are already correctly ordered; if m≈-1, they should be swapped. If the majority of the SNVs agree on the same swapping (indicated by a sample separation M sufficiently close to 1 in Step 3), we assign paternal origin and reorder A and B accordingly (Step 4).

More »

Expand

Table 2.

Imputation performance.

More »

Expand