Allele and haplotype frequencies of human leukocyte antigen-A, -B, -C, -DRB1, -DRB3/4/5, -DQA1, -DQB1, -DPA1, and -DPB1 by next generation sequencing-based typing in Koreans in South Korea

Allele frequencies and haplotype frequencies of HLA-A, -B, -C, -DRB1, -DRB3/4/5, -DQA1, -DQB1, -DPA1, and -DPB1 have been rarely reported in South Koreans using unambiguous, phase-resolved next generation DNA sequencing. In this study, HLA typing of 11 loci in 173 healthy South Koreans were performed using next generation DNA sequencing with long-range PCR, TruSight® HLA v2 kit, Illumina MiSeqDx platform system, and Assign™ for TruSight™ HLA software. Haplotype frequencies were calculated using the PyPop software. Direct counting methods were used to investigate the association with DRB1 for samples with only one copy of a particular secondary DRB locus. We compared these allele types with the ambiguous allele combinations of the IPD-IMGT/HLA database. We identified 20, 40, 26, 31, 19, 16, 4, and 16 alleles of HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1, and HLA-DPB1, respectively. The number of HLA-DRB3/4/5 alleles was 4, 5, and 3, respectively. The haplotype frequencies of most common haplotypes were as follows: A*33:03:01-B*44:03:01-C*14:03-DRB1*13:02:01-DQB1*06:04:01-DPB1*04:01:01 (2.89%), A*33:03:01-B*44:03:01-C*14:03 (4.91%), DRB1*08:03:02-DQA1*01:03:01-DQB1*06:01:01-DPA1*02:02:02-DPB1*05:01:01 (5.41%), DRB1*04:05:01-DRB4*01:03:01 (12.72%), DQA1*01:03:01-DQB1*06:01:01 (13.01%), and DPA1*02:02:02-DPB1*05:01:01 (30.83%). In samples with only one copy of a specific secondary DRB locus, we examined its association with DRB1. We, thus, resolved 10 allele ambiguities in HLA-B, -C (each exon 2+3), -DRB1, -DQB1, -DQA1, and -DPB1 (each exon 2) of the IPD-IMGT/HLA database. Korean population was geographically close to Japanese and Han Chinese populations in the genetic distances by multidimensional scaling (MDS) plots. The information obtained by HLA typing of the 11 extended loci by next generation sequencing may be useful for more exact diagnostic tests on various transplantations and the genetic population relationship studies in South Koreans.


Introduction
It is widely known that human leukocyte antigen (HLA) matching reduces morbidity and mortality in patients after hematopoietic stem cell transplantation (HSCT) [1]. Traditionally, Sanger sequencing has been the standard method for high-resolution HLA typing (PCR-SBT) [2], as the likelihood of accepting HLA mismatches for sensitized organ transplant candidates should be determined at high resolution, not at the antigen level [3]. However, SBT is unable to accurately phase heterozygous alleles and only provides limited sequencing information. Typically, SBT protocols cover only the exons 2, 3, and 4 for HLA class I genes and exons 2 and 3 for class II genes. This problem of ambiguity, the presence of two or more genotypes compatible with the same unphased sequence generated by SBT, is evidence for the complexity of the HLA region in the human genome, which contains more than 28,000 alleles identified in the IPD-IMGT/HLA database [4]. Time-consuming and costly additional testing is needed, and the vague list of delaying patient outcomes is still growing.

Sample preparation
DNA was collected from the blood of 173 genetically unrelated healthy Korean adults who were mainly consistent with students and staff from the Medical College of the Catholic University of Korea in Seoul in South Korea. The Korean people are originally derived from one ethnic group, Mongolian who migrated to the Korean peninsula about five thousand years ago, and preserved the unique physico/anthropological characteristics. South Korea is rapidly developed and urbanization is accelerating by fast-paced industrialization after the Korean War between 1950 and 1953. These phenomenons make the huge population influx to Seoul from the different rural areas. In this way, Seoul became a metropolitan and features of special geographic origins have been diluted. Moreover, genetic homogeneity was revealed on the Korean peninsula level, except Jeju [37]. The population of the Seoul Capital Area (Seoul, Incheon, and Gyeonggi) amounted to 25.89 million persons in 2019, which accounted for 50.0% of the total population of South Korea. The 173 Korean adult from medical school, the subject of this study, has been used as a group representing Koreans because it includes diverse Koreans locally (http://kostat.go.kr/portal/eng/pressReleases/8/7/index.board?bmode= download&bSeq=&aSeq=386088&ord=1). Genomic DNA was freshly extracted from 4 mL of peripheral blood mixed with ethylenediaminetetraacetic acid (EDTA) using the TIANamp Genomic DNA Extraction Kits (Tiangen Biotech Corporation, Beijing, China), according to the manufacturer's instructions. Extracted DNA was adjusted to a concentration of 50 ng/μL in Tris-ethylenediaminetetraacetic acid (TE) buffer [35], and DNA was quantified using a QuBit fluorometer (Life Technologies, Carlsbad, CA). After quantification, sample DNA was diluted to 10 ng/mL. All subjects provided informed consent to participate in genetic studies. Also, written informed consent was obtained from each participant. This research protocol was carried out in accordance with the Declaration of Helsinki with the approval of the Catholic University Institutional Review Board (IRB) (IRB number: MC13SISI0126).

HLA gene amplification
Freshly extracted genomic DNA was used to amplify each HLA locus according to the manufacturer's instructions: TruSight 1 HLA v2 Sequencing Panel. PCR amplicons were confirmed using 2% agarose gel electrophoresis prior to preparing the NGS libraries. Twenty-four samples (192 HLA loci) were run in a single NGS experiment. The samples were of sufficient quality to ensure library preparation, data quality, and analysis, as well as the correct HLA typing by NGS.

Genotyping of HLA alleles by Assign™ for TruSight™ HLA software
Data analysis was performed using Assign™ for TruSight™ HLA software (version 2.1.0.943, Illumina Inc., San Diego, CA). Sequencing data was interpreted on using the IPD-IMGT/HLA database 3.42.0 [4]. We compared the genotypes obtained with next-generation sequencing with the previous results acquired with Sanger sequencing, allowing the estimative of the NGS accuracy [35]. The Assign™ for TruSight™ HLA software was designed for the genotyping of HLA-A, -B, -C, -DRB1, -DRB3/4/5, -DQA1, -DQB1, -DPA1, and -DPB1 from the fastq sequencing reads provided by the Illumina MiSeqDx platform with the Illumina Pipeline software. The methods implemented within the Assign™ for TruSight™ HLA software utilizes a large number of reads per sequence.

Statistical analysis
Allele and haplotype frequencies. The allele frequencies were determined using a direct counting method. Haplotypes were calculated using the iterative Expectation-Maximization (EM) algorithm [38,39] implemented by the software PyPop-Win32-0.7.0 (www.pypop.org) for the HLA-A, -B, -C, -DRB1, -DQA1, -DQB1, -DPA1, and -DPB1 (S1-S5 Tables) [40]. The list of files that make up the PyPop-Win32-0.7.0 software contains a minimal configuration file called "sample.ini". The presence of [Emhaplofreq] section in the file enables haplotype estimation. In 'lociToEstHaplo' option in the section you can list the multi-locus haplotypes for which you wish the program to estimate (S6 Table).
Genotyping data were further investigated for predicted haplotypes using the PyPop-Win32-0.7.0 software. This analysis was not performed on all 11 loci of the HLA gene due to the limitation in the number of samples in this study. The association of DRB1 with DRB3/4/5 was analyzed by direct counting.
Association of DRB1 with secondary DR loci (HLA-DRB3, -DRB4, and -DRB5). Although not all samples present a secondary DR loci (HLA-DRB3, -DRB4, and -DRB5), direct counting methods were used to investigate association with DRB1 for samples with only one copy of a particular secondary DRB locus. The association of DRB1 with DRB3/4/5 loci were also investigated by comparison with previously reported DRB structures [41]. We referred to a two-field nomenclature found in volunteers from the US registry with European backgrounds and a prior population study of the Netherlands consistent with the more recent high-resolution haplotype assignments [14,22,23,25,26,34,42].
Multidimensional scaling (MDS) analysis. MDS analysis (for two dimensions, based on the Euclidian distance matrix computed using allele frequencies) were performed with ALS-CAL procedure using SPSS 27 software package (SPSS Inc., Chicago, IL, USA). ALSCAL procedure options were set to 'interval' for measurement level, 'Symmetric' for data matrix shape, 'Dissimilarity' for type, and 'Leave Tied' for Approach to Ties. For HLA-A, -B, -DRB1, -DQA1, and -DQB1, we analyzed using HLA data reported by Johansson et al [43]. For HLA-C, -DPA1, and -DPB1, HLA data collected from a worldwide selection of populations in Allelefriquencies.net database (http://www.allelefrequencies.net). The Japanese and Han Chinese populations were included because it was genetically close to this population in all analyzes, except for HLA-DPA1. Additionally, HLA data of deferent geographic regions was used in this study-HLA-C (13 populations): Japanese, Han Chinese, Australian, Southeast Asian, South Asian, West Asian, Oceanian, European, South American, North American, North African, and Sub-Saharan African; HLA-DPA1 (10 populations): Japanese, Southeast Asian, South Asian, Oceanian, European, Brazilian, South American, North American, and Sub-Saharan African; HLA-DPB1 (13 populations): Japanese, Han Chinese, Australian, Southeast Asian, South Asian, West Asian, Oceanian, European, South American, North American, North African, and Sub-Saharan African. The population groups were determined by 'Region' of HLA-Allele Frequency Search-Classical section in Allelefrequencies.net. Additionally, we selected to each country for 'Country' and '2 field' for 'Level of resolution' (e.g. Han Chinese population was selected to 'North-East Asia' for 'Region', 'China' for 'Country', and '2 field' for 'Level of resolution'.). HLA allele frequency of each population group was recalculated by following formula of Allelefrequencies.net: 'Allele Frequency: Total number of copies of the allele in the population sample (Alleles / 2n) in decimal format.' The genetic distances of 8 HLA loci were analyzed by the 2nd field allele frequencies of this study and the estimated allele frequencies of the population groups.

Hardy-Weinberg equilibrium
Hardy-Weinberg equilibrium tests were performed on the eight HLA loci. The statistical P value of observed, expected homozygotes and heterozygotes are given in S8 Table. The results showed that the P values at the loci were all more than 0.05. There were no detectable deviations at each of the eight loci from the Hardy-Weinberg equilibrium. P values greater than 0.05 indicate that the population is consistent with the Hardy-Weinberg equilibrium [44].

Haplotype analysis
The 6-locus haplotypes of HLA-A, -B, -C, -DRB1, -DQB1, and -DPB1 are listed in Table 2. The haplotype frequencies of HLA class I genes, HLA-A, -B, and -C, with values >1% are displayed in S11 Table except for overlapping parts of Table 2

Expected PCR-SBT ambiguities (described in the IPD-IMGT/HLA database) solved by the NGS assay
Analysis of the data obtained from NGS by the TruSight 1 HLA v2 kit and Assign™ for Tru-Sight™ HLA software resolved 10 allele ambiguities in HLA-B, -C (each exon 2+3), -DRB1, -DQB1, -DQA1, and -DPB1 (each exon 2) of the IPD-IMGT/HLA database (Release version 3.42.0) (S14 Table). The unresolved ambiguities observed using the PCR-sequence based typing method were in the IPD-IMGT/HLA database.
Compared with the Japanese population, in the 6 HLA loci [20], 7 kinds of the haplotypes with frequencies higher than 1% have been found in both populations, which were  Table 2 was lower than that of the top ten kinds of South Asian (2.17%) [16]. The top 10 haplotype frequencies of HLA-A, -B, and -C were the same at the 2nd field in the previous study (S11 Table) [24]. In the HLA class II, the haplotypes of the 3rd field of HLA-DRB1, -DQA1, -DQB1, -DPA1, and -DPB1 (>1%) were not the same as the top ten most frequent haplotypes of different ethnic groups, European, South Asian, and African or Caribbean Black (S12 Table) [16]. Twenty-five DRB1-DRB3/4/5 haplotypes in S13 Table were investigated from the DRB1 and DRB3/4/5 genotyping data by comparison to the previously reported DRB structures [41,42]. Only samples containing one copy of the secondary locus could also be evaluated against the previous study [14]. HLA genotyping data using the longrange PCR based NGS method are needed to understand the detailed DR haplotype structure and polymorphic generation. We compared the haplotype frequencies of 2-locus haplotypes of HLA class II genes in S13 Table, HLA-DQA1-DQB1 and HLA-DPA1-DPB1 (>1%) with those in Fukuoka, Japan [21]. The haplotypes shared 8 for HLA-DQA1-DQB1 and 9 for HLA-D-PA1-DPB1, respectively.
The PCR-sequence based typing method was observed as unresolved ambiguities in the IPD-IMGT/HLA database (Release version 3.42.0) (S14 Table). To increase the success rate of solid organ or hematopoietic stem cell transplantation, more extensive and high-resolution HLA typing is required. For the selection of solid organ donors in hypersensitized patients, a change in HLA type is needed, recognizing the need for high-resolution HLA types in traditional serologically defined HLA antigens [3]. More extended HLA typing region is needed to improve the success rate of unrelated hematopoietic stem cell transplantation [36].
In mitochondrial DNA study, MDS plot and unrooted phylogenetic tree were no significant differences [37]. The collected samples are composed from the same ethnic group in South Korea and can be said to have genetic homogeneity. Compared to other ethnic groups, we were able to account for genetic distances with MDS plots (Fig 1). It highlights the need to use multiple loci to study genetic population relationships such as genetic distance of HLA genes [43,48]. The genetic distance of South Koreans was measured for more extended HLA loci. The genetic distances of the HLA-A, -B, -C, -DRB1, -DQA1, -DQB1, -DPA1, and -DPB1 loci in the characteristics of the Korean samples support the theory that Koreans are primarily Northeast Asian origin. Our MDS analyses both genetic distance the South Koreans in close vicinity to Japanese and Han Chinese populations, whereas some analyses indicate a similarity to other Northeast Asian populations.
In conclusion, we analyzed the allele and haplotype frequencies of 11 entire and extensive HLA loci and the genetic distances by MDS plots compared to other ethnic groups. These data may be useful for more exact diagnostic tests of various transplantation and the genetic population relationship studies.