Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Assessing HLA imputation accuracy in a West African population

  • Ruth Nanjala,

    Roles Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    Current address: Nuttfield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, Kennedy Institute of Rheumatology, University of Oxford, Oxford, United Kingdom

    Affiliations Department of Biochemistry and Biotechnology, Pwani University, Kilifi, Kenya, Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa

  • Mamana Mbiyavanga,

    Roles Conceptualization, Methodology, Supervision, Writing – review & editing

    Affiliation Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa

  • Suhaila Hashim,

    Roles Supervision, Writing – review & editing

    Affiliations Department of Biochemistry and Biotechnology, Pwani University, Kilifi, Kenya, Pwani University Biosciences Research Centre, Pwani University, Kilifi, Kenya

  • Santie de Villiers,

    Roles Supervision, Writing – review & editing

    Affiliations Department of Biochemistry and Biotechnology, Pwani University, Kilifi, Kenya, Pwani University Biosciences Research Centre, Pwani University, Kilifi, Kenya

  • Nicola Mulder

    Roles Conceptualization, Supervision, Writing – review & editing

    nicola.mulder@uct.ac.za

    Affiliation Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa

Abstract

The Human Leukocyte Antigen (HLA) region plays an important role in autoimmune and infectious diseases. HLA is a highly polymorphic region and thus difficult to impute. We, therefore, sought to evaluate HLA imputation accuracy, specifically in a West African population, since they are understudied and are known to harbor high genetic diversity. The study sets were selected from 315 Gambian individuals within the Gambian Genome Variation Project (GGVP) Whole Genome Sequence datasets. Two different arrays, Illumina Omni 2.5 and Human Hereditary and Health in Africa (H3Africa), were assessed for the appropriateness of their markers, and these were used to test several imputation panels and tools. The reference panels were chosen from the 1000 Genomes (1kg-All), 1000 Genomes African (1kg-Afr), 1000 Genomes Gambian (1kg-Gwd), H3Africa, and the HLA Multi-ethnic datasets. HLA-A, HLA-B, and HLA-C alleles were imputed using HIBAG, SNP2HLA, CookHLA, and Minimac4, and concordance rate was used as an assessment metric. The best performing tool was found to be HIBAG, with a concordance rate of 0.84, while the best performing reference panel was the H3Africa panel, with a concordance rate of 0.62. Minimac4 (0.75) was shown to increase HLA-B allele imputation accuracy compared to HIBAG (0.71), SNP2HLA (0.51) and CookHLA (0.17). The H3Africa and Illumina Omni 2.5 array performances were comparable, showing that genotyping arrays have less influence on HLA imputation in West African populations. The findings show that using a larger population-specific reference panel and the HIBAG tool improves the accuracy of HLA imputation in a West African population.

Introduction

The Major Histocompatibility Complex (MHC) region is a large locus in the human genome composed of polymorphic Human Leukocyte Antigen (HLA) genes. The MHC region, found on the short arm of chromosome 6, spans around 5Mbp and contains over 200 genes, with 128 predicted to be expressed [1]. It is one of the most complex regions in the human genome due to the high density of polymorphism and linkage disequilibrium [2].

The HLA region is classified into three main classes: I, II, and III (Fig 1) [3]. Class I comprises HLA-A, HLA-B, and HLA-C genes that encode the heavy chains of class I molecules. Class II consists of HLA-DR, HLA-DQ, and HLA-DP subregions, each containing A and B genes encoding α and β chains, respectively [4]. Class III encodes several molecules important in inflammation, such as complement components C2, C4, and factor B, Tumor Necrosis Factor-alpha, lymphotoxin, and three heat shock proteins [5].

The HLA region plays an important role in the innate and adaptive immune system [6], the complement cascade system [5], cord blood, and bone marrow transplants [7]. Specific HLA proteins have been associated with cancer development [8], a wide range of autoimmune and infectious diseases [9], and adverse drug reactions [10]. Identifying the exact HLA alleles associated with diseases is paramount to discovering the underlying genetic pathophysiology [11] and potential therapeutic targets.

HLA imputation infers an individual’s HLA genotype using SNP genotype information at sites flanking the classical HLA loci [12]. Prior to imputation, microarrays are used to collect SNP data from many samples at a moderately low cost. HLA alleles are then statistically imputed considering the long-range disequilibrium between the HLA loci and SNP markers across the HLA region, as described by Leslie et al., (2008) [1315]. Imputation is a cheaper alternative to lab-based HLA typing, made possible due to the availability of large SNP datasets [12]. Imputation, combined with a larger database of reference haplotypes, can enable large-scale investigations, such as disease-association studies [16], where precise knowledge of the HLA type is essential.

Available HLA imputation tools use different algorithms. For instance, HIBAG uses attribute BAGging to maximize the advantages of bootstrap aggregation and random variables selection methods to improve accuracy [17]. SNP2HLA and CookHLA use BEAGLE [18] ⁠to impute HLA alleles and amino acid sequences, while Minimac4 uses the MaCH algorithm [19].

African genomes are more diverse and have a reduced linkage disequilibrium, making it even more challenging to impute HLA alleles [20]. Africa is regarded as the cradle of modern humans, Homo sapiens. Populations on other continents descended from groups that migrated from Africa thousands of years ago [21]. Genome-wide SNP genotyping revealed that African populations have maintained a large and subdivided structure throughout evolutionary history [22], and that the deepest splits between human populations lie in Sub-Saharan Africa [23, 24].

Assessing imputation accuracy is necessary as it is based on statistical inferences which involve probabilities. Additionally, the HLA region is highly variable as the alleles are inherited in a Mendelian fashion from each parent and thus vary from individual to individual [25]. Imputation performance can be affected by genotyping arrays, the number of individuals in the reference panel, the genetic and ethnic diversity represented, data quality, statistical method of the imputation tools, and how well the reference and study panels match.

Most studies that have assessed HLA imputation accuracy have used European, Asian, or multi-ethnic population data [10, 26, 27]. Previous studies have focused on evaluating general rather than HLA imputation accuracy in African populations [28]. The few studies that have examined HLA imputation accuracy in African populations have used target datasets from African Americans [29]. This study aimed to evaluate HLA allele imputation accuracy in a West African population, which has not been extensively studied, despite the heaviest disease burden occurring in Africa [30].

The study used GGVP data typed using the Optitype tool [31] as the gold standard to assess the performance of 4 imputation tools, three HLA-specific and one general. In addition, we also tested the effect of a population-specific versus a non-population-specific reference panel on imputation in a West African population. Finally, we assessed the impact of using data genotyped on different platforms and reference sample sizes for HLA imputation.

These results inform future GWAS studies on the most appropriate software, recommend reference panels for HLA imputation, and highlight the influence of genotyping arrays and reference panel size on HLA imputation accuracy.

Materials and methods

Study populations

The study used reference panels from the 1000 Genomes (1kg-All), 1000 Genomes African (1kg-Afr), 1000 Genomes Gambian (1kg-Gwd), Human Hereditary and Health in Africa (H3Africa), and the HLA Multi-ethnic datasets.

The study Whole Genome Sequence (WGS) dataset was derived from the Gambian Genome Variation Project (GGVP), a collaborative project between MRC Unit in the Gambia, the Wellcome Sanger Institute, and the MRC Centre for Genomics and Global Health at Oxford University. The GGVP dataset, supports the discovery and understanding of genetic variants influencing human diseases [32]. The GGVP datasets are open-access and can be found on the International Genome Sample Resource site [33]. Table 1 provides the sample size, number of SNPs, and number of HLA alleles for each dataset, while Table 2 describes the number of SNPs for each HLA locus across all datasets.

thumbnail
Table 1. List of the study target datasets and reference panel populations.

https://doi.org/10.1371/journal.pone.0291437.t001

thumbnail
Table 2. Number of SNPs for each HLA locus across datasets.

https://doi.org/10.1371/journal.pone.0291437.t002

We used the Illumina Omni 2.5 and the H3Africa array marker sets [40] to assess how the density of markers on the target dataset could affect the imputation performance of HLA alleles. The H3Africa array is based on the Illumina Omni 2.5 array, with approximately 75% markers overlapping with the Illumina Omni array, and the remaining 25% markers being custom-made. The Illumina Omni 2.5 array and the H3Africa array target datasets were created by selecting matching markers from the GGVP WGS datasets and masking the remaining SNPs.

HLA imputation strategy

The study focused on HLA class I alleles, the only class typed by OptiType [31]. Four tools were used to impute HLA alleles. These included HLA allele specific imputation tools HIBAG version 1.14.0 [41], CookHLA [42], SNP2HLA [14], and a general imputation tool, Minimac4 [43]. For SNP2HLA, PLINK version 1.07 was used for quality control, while BEAGLE version 3.0.4 was used for phasing and imputation. The Optitype [31] tool in the nf-core HLA typing pipeline [44, 45] was used to type HLA alleles. We then used Python scripts to combine HLA types into the required format for HIBAG, CookHLA, and SNP2HLA.

We used HLA Multi-ethnic [35] ready-made reference panel and customized four others—1kg-All, 1kg-Afr, 1kg-Gwd, and H3Africa—using HLA types and SNP genotypes for each imputation tool. For CookHLA, we fist generated a genetic map using the “MakeGeneticMap” module, then trained the reference panel using “MakeReference” module. The “MakeReference” module and the “hlaAttrBagging” function were used to train the SNP2HLA and HIBAG specific reference panels, respectively. For Minimac4, reference panels were generated using SNP genotypes and HLA alleles typed using the HLA-LA tool [46] instead of OptiType, matching the method used to create the HLA Multi-ethnic reference panel and thus enabling comparison.

HLA alleles were then imputed from SNP data using the “SNP2HLA” script with window size set to the default of 1000 for SNP2HLA and the “hlaPredict” function for HIBAG. For CookHLA, the “CookHLA.py” script was used for imputation. For Minimac4, HLA alleles were imputed by calling the Minimac4 tool. For the HLA Multi-ethnic reference panel, the sample datasets were submitted to the Michigan imputation server [47], and HLA imputation was conducted using the Minimac4 imputation tool.

Imputation accuracy assessment

We used concordance rate as the primary assessment metric, which is the percentage of correctly imputed best-guess alleles of all imputed alleles based on true HLA alleles. The true HLA alleles were obtained by typing HLA alleles from GGVP WGS data using OptiType tool, which has been shown to type HLA Class I alleles at 99% accuracy [48]. The “hlaCompareAllele” function in HIBAG was used to calculate the concordance rate, while the “measureacc” module in the CookHLA package [42] was used to calculate the SNP2HLA, CookHLA, and Minimac4 concordance rate.

The accuracy of results can also be assessed using HLA allele error rates. HLA allele frequency, which reflects the genetic diversity in a population, can also evaluate the accuracy of HLA alleles. HLA allele frequencies were computed using the PyPop [49] package and compared with concordance rates.

Reproducibility

For reproducibility, we automated the pipeline in the Nextflow workflow language, packaged and deployed the tools using Docker and Singularity containers, and used GitHub for documentation and version control [50].

A summary of the workflow used for the analysis is presented in Fig 2. Matching markers from the GGVP WGS datasets were chosen to produce the target datasets for the Illumina Omni 2.5 and the H3Africa arrays. The datasets were then imputed on 5 reference panels using 4 imputation tools, and HLA imputation accuracy was assessed using concordance rate.

Results

Sample data

The target dataset was obtained from the GGVP WGS dataset and used to select matching markers on the H3Africa and Illumina Omni 2.5 arrays. Of the 1,731,033 SNP markers on the H3Africa array, 13,436 MHC SNPs matched those in the GGVP WGS dataset, while 1,717,596 were unique to the H3Africa array. Of the 2,314,963 SNP markers on the Illumina Omni 2.5 array, 13,850 MHC SNPs matched those in the GGVP WGS dataset, while 2,301,113 were unique to the Illumina Omni 2.5 array. The 13,436 H3Africa array SNPs and 13,850 Illumina Omni 2.5 array SNPs were used as the sample datasets.

Table 3 describes the intersection between markers in the reference panel compared to the target array data. For example, of the 223,229 markers in the 1kg-All reference, 13,016 matched those in the Illumina Omni 2.5 array, while 210,213 were unique to the 1kg-All reference.

thumbnail
Table 3. Intersection of target datasets with reference datasets.

https://doi.org/10.1371/journal.pone.0291437.t003

Imputation concordance

Table 4 shows the concordance rate for the different imputation tools, genotyping arrays, and reference panels. Compared to HLA typing, the overall concordance rate of the imputed data was 0.837 for HIBAG, 0.769 for Minimac4, 0.584 for SNP2HLA, and 0.173 for CookHLA. The HLA Multi-ethnic was the best performing reference panel with an accuracy rate of 0.873, followed by the H3Africa panel at 0.619, then 0.609 for 1kg-Afr, 0.604 for 1kg-All and 0.531 for 1kg-Gwd. For the array comparison, data from the Omni 2.5 was more accurate than data from H3Africa. The Omni 2.5 array contained a few more Gambian SNPs than the H3Africa array, which would likely impact results. The averages exclude the HLA Multi-ethnic reference panel due to missing values.

There was no comparison of SNP2HLA, HIBAG, and CookHLA on the HLA Multi-ethnic panel, as the Michigan imputation server that contains the reference panel was prebuilt with Minimac4 only. From the analysis, HLA-C (0.668) allele imputation was found to be most accurate, followed closely by HLA-A (0.618) and lastly, HLA -B (0.551), as shown in Table 5.

Imputation accuracy based on reference panels

The H3Africa reference panel had the highest concordance with HLA typing using HIBAG (0.889) and CookHLA (0.212). The 1kg-All was the best performing reference panel for SNP2HLA (0.656), while the HLA Multi-ethnic had the highest concordance rate when using Minimac4 (0.873) (Fig 3).

Comparison of allele frequency and accuracy of HIBAG

HLA alleles imputed by HIBAG, the best performing imputation tool, were used for allele frequency and accuracy rate comparison (Fig 4). HLA imputation accuracy dropped when the frequency of HLA alleles increased across all the reference panels, especially for the HLA-B alleles.

thumbnail
Fig 4. Allele frequency vs. accuracy of HIBAG.

Accuracy tended to decrease with increasing frequency, especially for HLA-B alleles.

https://doi.org/10.1371/journal.pone.0291437.g004

Imputation accuracy based on error rates

Overall, HLA-B alleles had higher error rates (0.449) compared to HLA-A (0.382) and HLA-C (0.332), showing they were imputed less accurately. CookHLA imputed HLA alleles with the highest error rates (Fig 5A). An interesting observation was that Minimac4, a general imputation tool, imputed HLA-B alleles more accurately than any HLA-specific imputation tool.

thumbnail
Fig 5. Imputation accuracy comparison based on error rates.

Results from HIBAG (Fig 5B) showed that HLA-B had a higher error rate, followed by HLA-A and HLA-C. HLA-B imputation was less accurate for SNP2HLA (Fig 5C), followed by HLA-C and HLA-A. HLA-A had higher error rates for Minimac4 (Fig 5D) and CookHLA (Fig 5A), followed by HLA-B and HLA-C alleles.

https://doi.org/10.1371/journal.pone.0291437.g005

Discussion

We provide a detailed comparison of five reference panels, four imputation tools, and two genotyping arrays used for HLA imputation in a West African population. HIBAG and the H3Africa reference panel were the best performing imputation tool and reference panel, respectively.

The high performance of HIBAG is expected, as shown in a previous study [51]. Furthermore, HIBAG is robust for populations with complex linkage disequilibrium blocks [10]. Compared to Minimac4, SNP2HLA, and CookHLA, HIBAG uses unphased genotyped data, eliminating variation provided by phasing software and shortening the computational phasing steps. Regarding computational burden, HIBAG takes a long time to run when the reference panel needs to be customized. For instance, the 1kg-All reference panel, which was the largest, took approximately 20 days and 32 threads when training with HIBAG compared to a few hours with 9 threads when training with SNP2HLA. SNP2HLA provides an added advantage over HIBAG, since it imputes HLA SNPs, amino acids, and alleles, unlike HIBAG, which imputes only HLA alleles.

In general, the H3Africa reference panel outperformed the other reference panels due to its larger sample size and its relationship to the target population. Generally, the size of the reference panel [18] and the population specificity [51] substantially affect the accuracy of the HLA allele imputation. As expected, increased accuracy was achieved with a more extensive HLA Multi-ethnic reference panel, but we could not compare it with other tools as the server only provides the Minimac4 tool. Specifically, the H3Africa panel outperformed the other panels when using HIBAG, while the 1kg-All reference performed better with SNP2HLA, implying that the performance of HIBAG was based on population specificity and sample size [51], while the performance of SNP2HLA was based only on sample size [14]. The decrease in HLA imputation accuracy with increased frequency is comparable to a study by Karnes et al. (2017), who demonstrated that most low frequency HLA alleles had high concordance rates in African Americans and European Americans [26].

The performance of the Illumina Omni 2.5 array was slightly better than that of the H3Africa array because it has more SNPs in the target population, 13,850 SNPs, compared to 13,436 SNPs. However, this difference was statistically insignificant, showing that the choice of genotyping arrays has little influence on the accuracy of HLA imputation. However, note that the two arrays have significant overlap in their content, which may explain the similarities, therefore, it is necessary to compare more diverse arrays to fully assess the impact of array content. Verlouw et al. (2021) showed that genome-wide coverage of genotyping arrays correlates with the number of SNPs in genotyping arrays but does not correlate with the imputation quality [52]. Therefore, the choice of genotyping arrays should be based on additional genotyping array content, such as pharmacogenetics or HLA variants, and not only on the extent of genome coverage of genotyping arrays.

Imputation of HLA-B (0.551) was less accurate compared to HLA-A (0.618) and HLA-C (0.668) imputation. Accurately typing alleles in the HLA-B region is problematic due to high polymorphism [53]. According to Robinson et al. (2015), over 3000 allelic variants exist in the HLA-B region [54]. However, accurate imputation of HLA-B alleles is important, as they play a crucial role in the progression of acquired immune deficiency syndrome. A slow progression of the disease has been associated with individuals expressing HLA-B*57 and HLA-B*27, while rapid progression has been associated with individuals expressing HLA-B*35 alleles [55]. Minimac4 showed improved imputation accuracy of HLA-B alleles, suggesting that a general imputation tool can be used for studies targeting HLA-B alleles.

Conclusions

The most effective software for HLA allele imputation in this study was HIBAG. However, it has a long run time and high memory requirement during the training of the reference panel. A recommendation is to use HIBAG with the latest kernel version, 1.5, as it has GPU support. Another important observation is that reference panel sample size and population content influence HLA allele imputation accuracy.

This study identified factors to consider when selecting an imputation tool and reference panel to inform association studies focusing on the HLA region and West African populations. The results highlight the best tools and panels for accurately imputing HLA genotypes.

We recommend testing additional African populations other than the Gambian population to better assess imputation accuracy in specific African populations. Such an assessment was done recently for imputation across the genome [56], and we encourage more studies especially within the HLA region. Reference panels comparable in size should be used to reduce bias, where a single large panel outperforms smaller ones.

Building large African-specific reference panels will enable high-quality imputations, especially for studies that cannot afford the cost of next-generation sequencing, thus generating more data that can be used for genome-wide association and fine-mapping studies in African populations.

Acknowledgments

The Computational Biology Division within the University of Cape Town, South Africa for providing access to the Ilifu cloud server.

References

  1. 1. Cao H, Wu J, Wang Y, Jiang H, Zhang T, Liu X, et al. An Integrated Tool to Study MHC Region: Accurate SNV Detection and HLA Genes Typing in Human MHC Region Using Targeted High-Throughput Sequencing. PLoS One. 2013 Jul 24;8(7):e69388. pmid:23894464
  2. 2. Trowsdale J, Knight JC. Major histocompatibility complex genomics and human disease. Annu Rev Genomics Hum Genet. 2013;14:301–23. pmid:23875801
  3. 3. Xie M, Li J, Jiang T. Accurate HLA type inference using a weighted similarity graph. BMC Bioinformatics. 2010 Dec 14;11(SUPPL. 11):1–10. pmid:21172045
  4. 4. Marsh SGE. Nomenclature for factors of the HLA system, update January 2006. Tissue Antigens. 2006 May;67(5):438–9. pmid:16671955
  5. 5. Human Leukocyte Antigen (HLA) System—Immunology; Allergic Disorders—MSD Manual Professional Edition [Internet]. Available from: https://www.msdmanuals.com/en-gb/professional/immunology-allergic-disorders/biology-of-the-immune-system/human-leukocyte-antigen-hla-system.
  6. 6. Elahi S, Bernard N, Armas JB, Bettencourt BF, Yan WH, Lin A. The Emerging Roles of Human Leukocyte Antigen-F in Immune Modulation and Viral Infection of HLA-F in immune modulation, with a special emphasis placed on the roles of HLA-F and KIR3DS1 interactions in viral infection. Frontiers in Immunology. 2019;1:964.
  7. 7. Stavropoulos-Giokas C, Dinou A, Papassavas A. The Role of HLA in Cord Blood Transplantation. Bone Marrow Res. 2012. pmid:23097706
  8. 8. Dunne MR, Phelan JJ, Michielsen AJ, Aoife ·, Maguire A, Dunne C, et al. Characterising the prognostic potential of HLA-DR during colorectal cancer development. Cancer Immunology, Immunotherapy. 2020;69:1577–88. pmid:32306077
  9. 9. Shiina T, Hosomichi K, Inoko H, Kulski JK. The HLA genomic loci map: Expression, interaction, diversity and disease. Vol. 54, Journal of Human Genetics. 2009. p. 15–39. pmid:19158813
  10. 10. Khor SS, Yang W, Kawashima M, Kamitsuji S, Zheng X, Nishida N, et al. High-accuracy imputation for HLA class I and II genes based on high-resolution SNP data of population-specific references. The Pharmacogenomics Journal 2015 15:6. 2015 Feb 24;15(6):530–7. pmid:25707395
  11. 11. Dendrou CA, Petersen J, Rossjohn J, Fugger L. HLA variation and disease. Nat Rev Immunol. 2018;18(5):325–39. pmid:29292391
  12. 12. Meyer D, Nunes K. HLA imputation, what is it good for? Hum Immunol. 2017 Mar 1;78(3):239–41. pmid:28317600
  13. 13. Naj AC. Genotype Imputation in Genome-Wide Association Studies. Curr Protoc Hum Genet. 2019;102(1):1–15. pmid:31216114
  14. 14. Jia X, Han B, Onengut-Gumuscu S, Chen WM, Concannon PJ, Rich SS, et al. Imputing Amino Acid Polymorphisms in Human Leukocyte Antigens. PLoS One. 2013 Jun 6;8(6):e64683. pmid:23762245
  15. 15. Leslie S, Donnelly P, McVean G. A Statistical Method for Predicting Classical HLA Alleles from SNP Data. The American Journal of Human Genetics. 2008 Jan 10;82(1):48–56. pmid:18179884
  16. 16. Dilthey A, Leslie S, Moutsianas L, Shen J, Cox C, Nelson MR, et al. Multi-Population Classical HLA Type Imputation. PLoS Comput Biol. 2013;9(2):e1002877. pmid:23459081
  17. 17. Bryll R, Gutierrez-Osuna R, Quek F. Attribute bagging: Improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognit. 2003;36(6):1291–302.
  18. 18. Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2008;84(2):210–23.
  19. 19. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR. MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol. 2010 Dec;34(8):816–34. pmid:21058334
  20. 20. Lonjou C, Zhang W, Collins A, Tapper WJ, Elahi E, Maniatis N, et al. Linkage disequilibrium in human populations. Proc Natl Acad Sci U S A [Internet]. 2003 May 5;100(10):6069. pmid:12721363
  21. 21. Choudhury A, Aron S, Botigué LR, Sengupta D, Botha G, Bensellak T, et al. High-depth African genomes inform human migration and health. Nature. 2020 Oct 29;586(7831):741–8. pmid:33116287
  22. 22. Gurdasani D, Carstensen T, Tekola-Ayele F, Pagani L, Tachmazidou I, Hatzikotoulas K, et al. The African Genome Variation Project shapes medical genetics in Africa. Nature 2014 517:7534. 2014 Dec 3;517(7534):327–32. pmid:25470054
  23. 23. Gronau I, Hubisz MJ, Gulko B, Danko CG, Siepel A. Bayesian inference of ancient human demography from individual genome sequences. Nat Genet. 2011;43(10):1031–5. pmid:21926973
  24. 24. Nielsen R, Akey JM, Jakobsson M, Pritchard JK, Tishkoff S, Willerslev E. Tracing the peopling of the world through genomics. Nature. 2017;541(7637):302–10. pmid:28102248
  25. 25. Choo SY. The HLA System: Genetics, Immunology, Clinical Testing, and Clinical Implications. Yonsei Med J. 2007 Feb 20;48(1):11–23. pmid:17326240
  26. 26. Karnes JH, Shaffer CM, Bastarache L, Gaudieri S, Glazer AM, Steiner HE, et al. Comparison of HLA allelic imputation programs. PLoS One. 2017;12(2):1–12. pmid:28207879
  27. 27. Degenhardt F, Wendorff M, Wittig M, Ellinghaus E, Datta LW, Schembri J, et al. Construction and benchmarking of a multi-ethnic reference panel for the imputation of HLA class I and II alleles. Hum Mol Genet. 2019;28(12):20782092. pmid:30590525
  28. 28. Schurz H, Müller SJ, Van Helden PD, Tromp G, Hoal EG, Kinnear CJ, et al. Evaluating the accuracy of imputation methods in a five-way admixed population. Front Genet. 2019;10(february):1–9.
  29. 29. Levin AM, Adrianto I, Datta I, Iannuzzi MC, Trudeau S, McKeigue P, et al. Performance of HLA allele prediction methods in African Americans for class II genes HLA-DRB1, −DQB1, and–DPB1. BMC Genet [Internet]. 2014 Jun 16;15:72. pmid:24935557
  30. 30. Gross M. African genomes. Current Biology. 2011 Jul 12;21(13):R481–4.
  31. 31. Szolek A, Schubert B, Mohr C, Sturm M, Feldhahn M, Kohlbacher O. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics. 2014 Dec 1;30(23):3310–6. pmid:25143287
  32. 32. GGVP GRCh38 | IGSR data collection [Internet]. Available from: https://www.internationalgenome.org/data-portal/data-collection/ggvp-grch38.
  33. 33. Fairley S, Lowy-Gallego E, Perry E, Flicek P. The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res. 2020 Jan 1;48(D1):D941–7. pmid:31584097
  34. 34. Auton A, Abecasis GR, Altshuler DM, Durbin RM, Bentley DR, Chakravarti A, et al. A global reference for human genetic variation. Nature 2015 526:7571. 2015 Sep 30; 526(7571):68–74. pmid:26432245
  35. 35. Luo Y, Kanai M, Choi W, Li X, Sakaue S, Yamamoto K, et al. A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response. Nat Genet. 2021 Oct 1;53(10):1504–16. pmid:34611364
  36. 36. Hirata J, Hosomichi K, Sakaue S, Kanai M, Nakaoka H, Ishigaki K, et al. Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population. Nat Genet [Internet]. 2019 Mar 1;51(3):470–80. pmid:30692682
  37. 37. Okada Y, Momozawa Y, Sakaue S, Kanai M, Ishigaki K, Akiyama M, et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nature Communications 2018 9:1. 2018 Apr 24;9(1):1–10. pmid:29691385
  38. 38. Nelis M, Esko T, Mägi R, Zimprich F, Toncheva D, Karachanak S, et al. Genetic Structure of Europeans: A View from the North–East. PLoS One. 2009 May 8;4(5):e5472. Available from: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0005472 pmid:19424496
  39. 39. NHLBI Trans-Omics for Precision Medicine WGS-About TOPMed. [cited 2022 Nov 22]. Available from: https://topmed.nhlbi.nih.gov/.
  40. 40. H3Africa array annotations [Internet]. Available from: https://chipinfo.h3abionet.org/.
  41. 41. Zheng X, Shen J, Cox C, Wakefield JC, Ehm MG, Nelson MR, et al. HIBAG—HLA genotype imputation with attribute bagging. Pharmacogenomics Journal. 2014;14(2):192–200. pmid:23712092
  42. 42. Luo Y, Cook S, Choi W, Lim H, Kim K, Jia X, et al. Accurate imputation of human leukocyte antigens with CookHLA. Nat Commun. 2021;12(1):1–11.
  43. 43. Van Leeuwen EM, Kanterakis A, Deelen P, Kattenberg M V., Slagboom PE, De Bakker PIW, et al. Population-specific genotype imputations using minimac or IMPUTE2. Nat Protoc. 2015;10(9):1285–96. pmid:26226460
  44. 44. nf-core/hlatyping: Precision HLA typing from next-generation sequencing data [Internet]. Available from: https://github.com/nf-core/hlatyping.
  45. 45. Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar 1;38(3):276–8. pmid:32055031
  46. 46. Dilthey AT, Mentzer AJ, Carapito R, Cutland C, Cereb N, Madhi SA, et al. HLA*LA—HLA typing from linearly projected graph alignments. Bioinformatics. 2019 Nov;35(21):4394–6. pmid:30942877
  47. 47. Michigan Imputation Server [Internet]. Available from: https://imputationserver.sph.umich.edu/index.html#!
  48. 48. Yi J, Chen L, Xiao Y, Zhao Z, Su X. Investigations of sequencing data and sample type on HLA class Ia typing with different computational tools. Brief Bioinform. 2021 May 20;22(3):1–6. pmid:32662817
  49. 49. (8) (PDF) PyPop User Guide: User Guide for Python for Population Genomics [Internet]. Available from: https://www.researchgate.net/publication/271852987_PyPop_User_Guide_User_Guide_for_Python_for_Population_Genomics.
  50. 50. nanjalaruth/MHC-Imputation-Accuracy: A project on evaluating the accuracy of genotype imputation in the human MHC region in selected African populations. [Internet]. Available from: https://github.com/nanjalaruth/MHC-Imputation-Accuracy.
  51. 51. Ritari J, Hyvärinen K, Clancy J, Partanen J, Koskela S. Increasing accuracy of HLA imputation by a population-specific reference panel in a FinnGen biobank cohort. NAR Genom Bioinform. 2020;2(2):1–9.
  52. 52. Verlouw JAM, Clemens E, de Vries JH, Zolk O, Verkerk AJMH, am Zehnhoff-Dinnesen A, et al. A comparison of genotyping arrays. European Journal of Human Genetics. 2021;29(11):1611–24. pmid:34140649
  53. 53. Raghavan M, Geng J. HLA-B polymorphisms and intracellular assembly modes. Mol Immunol. 2015;68(2):89–93. pmid:26239417
  54. 54. Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SGE. The IPD and IMGT/HLA database: Allele variant databases. Nucleic Acids Res. 2015;43(D1):D423–31. pmid:25414341
  55. 55. Carrington M, Walker BD. Immunogenetics of spontaneous control of HIV. Vol. 63, Annual Review of Medicine. 2012. p. 131–45. pmid:22248321
  56. 56. Sengupta D, Botha G, Meintjes A, Mulder N, Le Ramsay M, Choudhury Correspondence A. Performance, and accuracy evaluation of reference panels for genotype imputation in sub-Saharan African populations. Cell Genom. 2023 May 23;3(6):100332. pmid:37388906