TY - JOUR T1 - High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs A1 - Dilthey, Alexander T. A1 - Gourraud, Pierre-Antoine A1 - Mentzer, Alexander J. A1 - Cereb, Nezih A1 - Iqbal, Zamin A1 - McVean, Gil Y1 - 2016/10/28 N2 - Author Summary Determining an individual’s HLA type (the sequence of the exons of the HLA genes) is important in many areas of biomedical research. For example, HLA types shape immune epitope repertoires, which are relevant in cancer immunotherapy, and influence autoimmune and infectious disease risk. Whole-genome sequencing data, currently being generated for hundreds of thousands of individuals, contains the information necessary for HLA typing–but inferring accurate HLA types from these is a challenging problem. First, the HLA genes are the most polymorphic genes in the human genome; second, these genes and their variant alleles exhibit high degrees of sequence similarity (due to a shared evolutionary origin). This makes it difficult to establish which specific HLA gene a given observed sequencing read derives from. We show that this problem can be addressed using a Population Reference Graph (PRG): for each gene, the PRG contains not only the reference sequence but also variant alleles, thus enabling, using a novel sequence-to-graph mapping algorithm, the accurate mapping of reads to HLA genes. We also show that HLA*PRG, the algorithm implementing our approach, achieves–based on standard whole-genome sequencing data–accuracies comparable to those of specialized gold-standard methods. HLA*PRG is open source and freely available. JF - PLOS Computational Biology JA - PLOS Computational Biology VL - 12 IS - 10 UR - https://doi.org/10.1371/journal.pcbi.1005151 SP - e1005151 EP - PB - Public Library of Science M3 - doi:10.1371/journal.pcbi.1005151 ER -