Phased Whole-Genome Genetic Risk in a Family Quartet Using a Major Allele Reference Sequence

doi:10.1371/journal.pgen.1002280

Figure 1.

Pedigree and genetic risk prediction workflow.

A, Family pedigree with known medical history. The displayed ages represent the age of death for deceased subjects or the age at the time of medical history collection (9/2010) for living family members. Arrows denote sequenced family members. Abbreviations: AD, Alzheimer's disease; CABG, coronary artery bypass graft surgery; CHF, congestive heart failure; CVA, cerebrovascular accident; DM, diabetes mellitus; DVT, deep venous thrombosis; GERD, gastroesophageal reflux disease; HTN, hypertension; IDDM, insulin-dependent diabetes mellitus; MI, myocardial infarction; SAB, spontaneous abortion; SCD, sudden cardiac death. B, Workflow for phased genetic risk evaluation using whole genome sequencing.

More »

Expand

Figure 2.

Development of major allele reference sequences.

Allele frequencies from the low coverage whole genome sequencing pilot of the 1000 genomes data were used to estimate the major allele for each of the three main HapMap populations. The major allele was substituted for the NCBI reference sequence 37.1 reference base at every position at which the reference base differed from the major allele, resulting in approximately 1.6 million single nucleotide substitutions in the reference sequence. A, Approximately half of these positions were shared between all three HapMap population groups, with the YRI population containing the greatest number of major alleles differing from the NCBI reference sequence. B, Number of disease-associated variants represented in the NCBI reference genome by the minor allele in each of the three HapMap populations. C, Number of positions per Mbp at which the major allele differed from the reference base by chromosome and HapMap population.

More »

Expand

Figure 3.

Inheritance state analysis, error estimation, and phasing.

A, A Hidden Markov Model (HMM) was used to infer one of four Mendelian and two non-Mendelian inheritance states for each allele assortment at variant positions across the quartet. “MIE-rich” refers to Mendelian-inheritance error (MIE) rich regions. “Compression” refers to genotype errors from heterozygous structural variation in the reference or study subjects, manifest as a high proportion of uniformly heterozygous positions across the quartet. B, A combination of quality score calibration using orthogonal genotyping technology and filtering SNVs in error prone regions (MIE-rich and compression regions) identified by the HMM resulted in >90% reduction in the genotype error rate estimated by the MIE rate. C, Consistent with PRDM9 allelic status, approximately half of all recombinations in each parent occurred in hotspots. The mother has two haplotypes in the gene RNF212 associated with low recombination rates, while the father has one haplotype each associated with high and low recombination rates. Notation denotes base at [rs3796619, rs1670533]. D, Variant phasing using pedigree, inheritance state, and population linkage disequilibrium data. Pedigree data were first used to phase informative allele assortments in trios (top). The inheritance state of neighboring regions was used to phase positions in which all members of a mother-father-child trio were heterozygous and the sibling was homozygous for the reference or non-reference allele (middle). For uniformly heterozygous positions, we phased the non-reference allele using a maximum likelihood model to assign the non-reference allele to paternal or maternal chromosomes based on population linkage disequilibrium with phased SNVs within 250 kbp (bottom). In all panels a corresponds to the reference allele and b to the non-reference allele.

More »

Expand

Table 1.

Putative loss of function variants across the family quartet.

More »

Expand

Table 2.

Rare variants with known clinical associations.

More »

Expand

Figure 4.

Ancestry and immunogenotyping using phased variant data.

A, Ancestry analysis of maternal and paternal origins based on principle components analysis of SNP genotypes intersected with the Population Reference Sample dataset. B, The HMM identified a recombination spanning the HLA–B locus and facilitated resolution of haplotype phase at HLA loci. Contig colors in the lower panel correspond to the inheritance state as depicted in Figure 3A. C, Common HLA types for family quartet based on phased sequence data.

More »

Expand

Figure 5.

Common variant risk prediction.

A, Common variant risk prediction for 28 disease states for each of the family members (f, father; m, mother; s, son; d, daughter) and 174 ethnicity-matched HapMap subjects. The x-axis in each plot represents the log10(likelihood ratio) for each disease according to allelic distribution of SNPs identified in the literature as significantly associated with disease by 2 or more studies including 2000 or more total subjects. B, Upper left: pre (base) and post (bar end) estimates of disease risk for the father according to common variant risk prediction, derived from the pre-probability of disease multiplied by the composite likelihood ratio from all SNPs meeting the criteria described above. Upper right: Composite likelihood ratio estimates for disease risk according to common genetic variation. Blue bars represent paternal estimate, pink bars represent maternal estimate, red points represent the estimate for the daughter, and blue points represent the estimate for the son. Lower panels: parental haplotype contribution to disease risk for each child (points) for the daughter (lower left) and son (lower right). Blue shading represents paternal haplotype risk allele contribution and pink shading represents maternal haplotype risk allele contribution.

More »

Expand

Table 3.

Drug metabolizing enzyme variants.

More »

Expand

Table 4.

Genetic pharmacological response predictions.

More »

Expand