Diversity of Extended HLA-DRB1 Haplotypes in the Finnish Population

The Major Histocompatibility Complex (MHC, 6p21) codes for traditional HLA and other host response related genes. The polymorphic HLA-DRB1 gene in MHC Class II has been associated with several complex diseases. In this study we focus on MHC haplotype structures in the Finnish population. We explore the variability of extended HLA-DRB1 haplotypes in relation to the other traditional HLA genes and a selected group of MHC class III genes. A total of 150 healthy Finnish individuals were included in the study. Subjects were genotyped for HLA alleles (HLA-A, -B, -DRB1, -DQB1, and -DPB1). The polymorphism of TNF, LTA, C4, BTNL2 and HLA-DRA genes was studied with 74 SNPs (single nucleotide polymorphism). The C4A and C4B gene copy numbers and a 2-bp silencing insertion at exon 29 in C4A gene were analysed with quantitative genomic realtime-PCR. The allele frequencies for each locus were calculated and haplotypes were constructed using both the traditional HLA alleles and SNP blocks. The most frequent Finnish A∼B∼DR -haplotype, uncommon in elsewhere in Europe, was A*03∼B*35∼DRB1*01∶01. The second most common haplotype was a common European ancestral haplotype AH 8.1 (A*01∼B*08∼DRB1*03∶01). Extended haplotypes containing HLA-B, TNF block, C4 and HLA-DPB1 strongly increased the number of HLA-DRB1 haplotypes showing variability in the extended HLA-DRB1 haplotype structures. On the contrary, BTNL2 block and HLA-DQB1 were more conserved showing linkage with the HLA-DRB1 alleles. We show that the use of HLA-DRB1 haplotypes rather than single HLA-DRB1 alleles is advantageous when studying the polymorphisms and LD patters of the MHC region. For disease association studies the HLA-DRB1 haplotypes with various MHC markers allows us to cluster haplotypes with functionally important gene variants such as C4 deficiency and cytokines TNF and LTA, and provides hypotheses for further assessment. Our study corroborates the importance of studying population-specific MHC haplotypes.


Introduction
The Major Histocompatibility Complex located on chromosome 6p21 has a complex allelic structure with extended linkage disequilibrium (LD) and polymorphism. The traditional human leukocyte antigen (HLA) genes encode the cell-surface antigenpresenting proteins, the HLA molecules, and fall into two major MHC classes; Class I (HLA-A, -C, and -B) and Class II (HLA-DRB1, -DQB1, and -DPB1). Many non-HLA genes related to immune responses e.g. tumor necrosis factor (TNF), lymphotoxin-alpha (LTA), complement C4 genes (C4A and C4B) and butyrophilin-like protein 2 (BTNL2), are located in the MHC class III region that resides between the MHC Class I and II regions [1][2][3]. The location of recombination hotspots and the length of LD blocks (genomic fragments inherited together) are population specific [1,4,5]. Interestingly, it has been shown that the Finnish population has distinctive population substructure compared with other Europeans [6][7][8].
The HLA genes play a critical role in hematopoietic stem cell transplantation, and HLA mismatching has been associated with graft failure and graft-versus-host disease [9,10]. In addition to the traditional HLA genes, it has been suggested that the HLA-SNP haplotypes influence on the outcome of the transplantation [11]. The association between MHC and diseases has been known for decades. The polymorphic HLA-DRB1 has been associated with several complex and infectious diseases, such as acute coronary syndrome [12], HSV2 related meningitis [13], type I diabetes, celiac disease and multiple sclerosis (reviewed in [14]). Recently, genome wide association analyses (GWA) have increased the number of MHC gene associations with several drug-induced hypersensitivity reactions, autoimmune, infectious and inflammatory disorders [15][16][17][18][19][20] (GWAS Integrator, http://genome.ucsc. edu). However, the predisposing HLA-alleles are common in the healthy population. Other genetic and environmental triggers are required for disease susceptibility [15,21]. Thus, the complexity of MHC makes the variant(s) responsible for the causal effect difficult to pinpoint. As the traditional HLA typing is considered rather expensive and laborious and the analysis of MHC regions is complex, many studies do not probe MHC associations more closely. Thus the functionality of the variants remains uncertain although identification of a specific risk HLA allele could help to understand a disease and its causality further.  At present, for transplantation and many diagnostics proposes, traditional HLA typing still remains as the most used method [14]. De Bakker [22] presented a tag-SNP based HLA-typing method as an alternative solution for traditional HLA typing. It has promoted the screening of certain disease related HLA markers [21,23]. However, not only HLA allele and haplotype frequencies [8,24] but also the SNP content of haplotypes differ in ethnically diverse populations complicating the imputation process. It is clear that more information on the precise content of HLA haplotypes is needed for transplantation, disease association, anthropological, and epidemiological studies [8,25].
In this study we focused on studying HLA-DRB1 haplotype structures in a Finnish population. The allelic diversities of other traditional HLA-genes and selected group of MHC class III genes were included and extended haplotypes inferred. We aimed to interpret the haplotype diversities in relation to HLA-DRB1 locus because of its higher amount of polymorphism when compared to the other MHC Class II genes (IMGT database, [26]) and its wellknown associations with several complex diseases. The selected MHC genes TNF, LTA, C4, BTNL2, and HLA-DRA all have been associated with several autoimmune or infectious diseases, as well. We hypothesized that the use of longer MHC blocks, rather than single alleles, could be advantageous when studying the polymorphisms in the MHC region.

Study Population
The study consists of one hundred and fifty (150) healthy Finnish individuals who were randomly selected. Description of the sample set and the DNA extraction has previously been published by Seppä nen et al. [17]. Briefly, samples from 49 males and 101 females with mean age of 33.

Genotyping of HLA Genes
The HLA genotyping and/or analysis was carried out in an EFI (European Federation for Immunogenetics) accredited HLA Laboratory. The genotyping of HLA-A, -B and -DPB1 genes were performed using sequence specific primers (SSP: Olerup SSP AB, Stockholm, Sweden). The HLA-DRB1 alleles were detected using sequence based typing (SBT; InvitrogenTM, Life Technologies, Carlsbad, CA, USA). HLA-DQB1 alleles were detected with a panel of lanthanide-labeled oligonucleotide probes [27]. The reactions were performed according to manufacturers' instructions giving at least four-digit resolution (for example, HLA-DRB1*01:01) for HLA-DRB1 and -DPB1, and at least two-digit resolution (for example, HLA-A*02) for HLA-A, -B and -DQB1. For resolving ambiguities, high resolution SSPs or SBT HARPs (heterozygous ambiguity resolving primers) were used. The HLA alleles were assessed using HLA nomenclature release 3.5.0 (IMGT/HLA database) and carefully interpreted by two persons.

SNP Selection and Genotyping
We aimed to genotyped LTA, TNF, BTNL2 and HLA-DRA genes with SNPs. All the selected genes were covered entirely including 59 and 39-flanking regions. SNPs were chosen from the HapMap database [28] and/or from the public dbSNP database (http://www.ncbi.nlm.nih.gov/projects/SNP). Information about the validation status, tagging quality, minor allele frequency Table 1. Cont.  (.0.01) and gene structure were used for selecting the SNPs. Altogether seventy-four SNPs were chosen from the two gene regions, TNF,LTA, (here referred as TNF block) and BTNL2,HLA-DRA (here referred as BTNL2 block). SNP genotyping was performed using the Sequenom MassArray iPLEX system (Sequenom, San Diego, CA, USA). In the iPLEX assay, the SNP alleles are separated based on the differences of the single base extension (SBE) products. Manufacture's instructions were used to design the assays (AssayDesign software) and to perform the multiplex PCR and the iPLEX reaction using 9-10 ng of DNA as a template.
The complement C4A and C4B gene copy numbers and a 2-bp silencing insertion at exon 29 (CT) in C4A gene were analysed by quantitative genomic realtime-PCR Rotor-Gene 6000 (Corbett Research, Sydney, Australia) according to Paakkanen et al. [29]. The C4 allotypes were determined by immunofixation [17]. One subject was excluded from the analysis.

Statistical Analysis
Allele, phenotype and haplotype frequencies were calculated by direct counting. To detect significant departure from Hardy-Weinberg equilibrium (p,0.001), Haploview (SNPs) [30] or ARLEQUIN 3.11 (HLA genes) [31], were used. SNP haplotypes were constructed using Haploview [30]. Multilocus haplotype frequencies and recombination rates were estimated from allele data using the Bayesian method with PHASE v. 2.1.1 [32]. The haplotypes were constructed using all the selected markers simultaneously. Taken into account the small sample size and to exclude unreliable haplotypes, only haplotypes greater than 1% (observed more than 3 times) were used in the analysis [24,[33][34][35].
The LD measures (D9 and r 2 ) were determined using either the Haploview software (biallelic markers) [30] or the ARLEQUIN 3.11 software (multiallelic markers) [31]. At this step, only one SNP from a LD block was chosen for further analysis. It has been shown that for multiallelic loci, D9 estimates the strength of LD (.0.80 strong LD, 20.5 moderate LD, 20 weak LD) better than r 2 [36]. The R-package 'ape' was used to perform a Neighborjoining algorithm according to the method of Saitou and Nei [37]. The HapMap data was used for illustrating the recombination hotspots in the MHC region [28]. The proxy SNPs (r 2 .0.9) for genotyped SNPs were detected using software SNAP [38].

Genotyping of HLA-alleles
The HLA alleles and their frequencies are given in parallel with the phenotype frequency in Table 1. HLA allele distributions followed the Hardy-Weinberg equation. A total of 91 HLA alleles (15,21,26,11 and 18 alleles in HLA-A, -B, -DRB1, -DQB1 and -DPB1, respectively) were noted. In the Finnish population, two HLA-A alleles accounted for .66% and five HLA-DPB1 alleles for .91% of the variation at the loci. Allele frequencies of other HLA loci were more equally distributed. These observations are consistent with the previous Finnish studies [39,40].

HLA Haplotypes
Three-locus haplotypes between HLA-DRB1 and MHC Class I (HLA-A and -B; Table 2) and MHC Class II (HLA-DQB1 and -DPB1; Table 3) were constructed. Haplotypes having frequencies higher than 1.0% are presented in Table 2 Table S1.The linkage between HLA-DRB1 and HLA-DQB1 was stronger than between HLA-DRB1 and HLA-DPB1 (Table S2).

SNP Analysis
Altogether 74 SNPs were successfully genotyped. The average success rate was 99% and no discrepancies were observed. Fourteen SNPs were excluded due to minor allele frequency (,0.01) or HWE (,0.001) and five SNPs were excluded as they were in total LD (r 2 = 1) with another SNP. There can be found several proxy SNPs (SNPs in strong LD) for our SNPs (examples shown in Table S3) [38]. Many of the proxy SNPs have been previously associated with diseases (http://www.snp-nexus.org/).
A summary of the accepted SNPs (n = 55) is given in Table S4. The allele frequencies of our genotyped SNPs did not differ significantly from the HapMap (CEU population, European decent) [28], except the twelve SNPs in BTNL2 or HLA-DRA genes (Table S4). The LD structure of the SNPs (TNF or BTNL2 block) is presented in the Figure S1. As expected, due to the SNP selection criteria the pairwise LD between SNPs was always r 2 ,1.
Previously using HapMap data [28], high recombination rates have been observed between HLA-A and -B loci, upstream and downstream of TNF, in the BTNL2 region and between HLA-DQB1 and -DPB1 loci ( Figure 1A). Here, the recombination rate estimation was performed using the HLA alleles (HLA-A, -B, -DRB1, -DQB1, -DPB1), C4 gene copy numbers and SNPs (n = 55). Thirty-one SNPs were common in both HapMap and in our study. The location of the highest recombination rate in the Finnish sample was observed in the BTNL2 promoter region corresponding to HapMap data (Figures 1 B and C).

Haplotypes of TNF, BTNL2 and C4
SNP haplotypes were constructed. The rare haplotypes (observed less than 3 times i.e. frequency ,1%) were excluded at this point. Nine 13-SNP haplotypes of TNF block (Table 4) and twelve 42-SNP haplotype of BTNL2 block (Table 6) were observed with frequencies ranging from 3% to 21% and 1.3% to 15%, respectively. In both regions, there seemed to be four to five common haplotypes of almost equal frequency in addition to the less common haplotypes. C4A and C4B gene copy numbers were constructed into haplotypes and the results are shown in Table 8. As opposed to TNF and BTNL2 haplotypes, there was one haplotype, which covered more than half of the observed haplotypes (C4_1 haplotype, 59%). The frequency of the next common haplotype was less than 20%, showing a sharp decrease. The significant pair-wise LD [(D9.0.80, r 2 .0.35, P value (P) ,0.05)] between HLA-DRB1 alleles and other markers is presented in Table 10. The HLA-DRB1 allele distribution (%) presented in Table S2 shows how HLA-DRB1 alleles are clustered with MHC class I, II and III alleles and MHC blocks.
To summarize the LD and haplotype analysis and to the polymorphism of the MHC region and its relation to HLA-DRB1, an additional four-locus haplotype (HLA-DRB1, TNF block, BTNL2 block and C4; Table 11) and a six-locus haplotype (HLA-DRB1, HLA-B, TNF block, BTNL2 block, C4A and C4B allotypes; Table S5) were performed. The results showed that the extended HLA-DRB1 haplotypes were broken down when HLA-B, C4 allotypes and TNF block were taken into account.

Discussion
To our knowledge, no extensive study exist that combines information from HLA-A,-B,-DRB1,-DQB1 and -DPB1 alleles with TNF, BTNL2 and complement C4 blocks. In this study, we addressed (i) the diversity of extended HLA-DRB1 haplotypes covering the MHC class I, II and III regions, (ii) the shared MHC or SNP markers in extended HLA-DRB1 haplotypes, and (iii) the challenge in detecting causal variants in the HLA data.
The most common Finnish A,B,DR -haplotype was A*03,B*35,DRB1*01:01. Also other HLA-DRB1*01:01 haplotypes with variable levels of LD were observed. According to The Allele Frequency Net Database [41], only in a few populations e.g. the Swedish Sami [44] and Russia [46] the A*03,B*35,DRB1*01:01 was found with the frequency .2%. In Finland, the second most common haplotype was A*01,B*08,DRB1*03:01 in high LD. This haplotype is not so common in Finns as in other Europeans and has been previously referred as the ancestral haplotype AH 8.1 or autoimmune  TNF_1  TNF_2  TNF_3  TNF_4  TNF_5  TNF_6  TNF_7  TNF_8  TNF_9 rs2009658  haplotype [47,48]. Another conserved haplotype with strong LD, but rare in the Finnish population, was HLA-DRB1*13:02 reaching from TNF to HLA-DQB1.
The enrichment or loss of certain HLA haplotypes (Table 2) reflects the characteristics of the Finnish population structure, which has evolved through multiple genetic bottlenecks [6,7]. Detailed information of population substructures was presented by the 16th International HLA and Immunogenic Workshop IHIW project (''Analysis of HLA Population Data'' [8]. Most importantly, the multidimensional scaling (MDS) of HLA-DRB1 revealed  BTNL2_1 BTNL2_2 BTNL2_3 BTNL2_4 BTNL2_5 BTNL2_6 BTNL2_7 BTNL2_8 BTNL2_9 BTNL2_10 BTNL2_11 BTNL2_12   rs28362678  T  C  C  C  C  T  C  C  C  C  C  C   rs2076530  C  T  C  T  T  C  T  T  T  T  T  T rs9268480 T  T  T  T  T  T  T  T  T  C  C  T   rs2395166  C  C  T  T  T  T  T  T  T  T  T  C   rs3135365  T  G  T  T  T  T  T  T  T  T  T  T   rs3135363  T  T  T  T  that the Finns and the Sami are closer to the North-East Asians than to other European populations [8]. In general, the HLA variation in Europe follows the North to Southeast axis corresponding to the previous principal component analysis (PCA) based results that utilized genome-wide SNP data [7]. Interestingly in Finland, the prevalence of certain HLA alleles have shown regional differences [40] e.g. HLA-B*35 being highest in the Eastern parts of Finland. Also GWASs have shown similar trends [7]. The HLA haplotype deviations in relation to the Finnish population substructure warrant replication studies. The population stratification is important e.g. for control selection [34]. We found that the majority of HLA-DRB1 alleles were inherited as extended blocks from BTNL2 to HLA-DQB1. In spite of the observed recombination rate in the BTNL2 promoter region, most DRB1,BTNL2 blocks appeared to be conserved. Including HLA-B, TNF, C4 and HLA-DPB1 the number of extended HLA-DRB1 haplotypes strongly increased. The positions of the MHC recombination sites vary between populations [49] explaining partly the non-replication of disease associations between populations (e.g. [19,50,51]).
Overall, analysis of immunogenomic data is challenging. Resolving HLA allele ambiguity, phasing and LD calculation warrant particular expertise, and the traditional software tools (e.g. Haploview and PLINK) are not suitable for multiple loci polymorphic data like HLA [34,54]. Haplotypes rather than single markers were used to decrease phasing errors [55]. Due to the strong LD, multiple SNPs may have corresponding statistical    proof of association making the search for possible causal variants exceptionally difficult. To clarify the complexity, the known tag-SNP for HLA-DRB1*15:01 (rs3135388) (see Figure S3), was shown to have at least 20 proxy SNPs (r 2 .0.9) with variable function and in different gene regions [14,22,38]. The SNPs' allele frequency might be also population specific (rs2213585 ; Table S4), and thus the tag-SNPing (i.e. imputation) should not be used unless the ethnic background is known [56]. Here in this material, except the tag-SNP for HLA-DRB1*15:01 (rs3135388) [14,22], we did not detect any single tag-SNP for a specific HLA-DRB1 allele. Indeed, large population specific cohorts and dense SNP genotyping is needed for detecting HLA tag-SNPs. We acknowledge that the multiple ambiguous alleles, limited sample size, and rare HLA alleles can influence the haplotype phasing and LD leading to false positive results [34]. In case of small sample size, the study of the rare MHC haplotypes is challenging. Thus, we presented only frequent MHC haplotypes (.1%) and interpreted the LD between markers carefully [34,35,54,55,57]. The HLA allele distributions were consistent with the previously published Finnish registry studies [39,40] suggesting that the Finnish HLA profile can be estimated with a sample set containing 150 individuals.
One of the limitations of this study was the lack of genome-wide SNP data. Hence, we were not able to use HLA*IMP [58] for allele imputation or analyse the HLA tagging SNPs in the Finnish populations [22]. Because of the differences between populations, population specific validation is highly recommended before using either the HLA*IMP or the HLA tagging SNPs [7,14,23].
Taken together, we stress the importance of understanding the population specific MHC haplotypes and the analysis of immunogenetic data. The study of extended HLA-DRB1 haplotypes indicates the functionality of the implicated genes and provides hypotheses for further assessment of HLA-DRB1. The results presented here assist for disease association studies focusing in chronic inflammatory, autoimmune and infectious diseases. Figure S1 The LD (r 2 ) structure of TNF and BTNL2 blocks using SNPs. (TIF) Figure S2 Phylogenetic trees based on the genetic distance of TNF and BTNL2 blocks with bootstrap values. (TIF) Figure S3 A known tag-SNP for HLA-DRB1*15:01 is in strong LD with many other SNPs in the MHC Class III region. The known proxies are taken from the HapMap [28] and using the software SNAP [38].