The complex pattern of genetic associations of leprosy with HLA class I and class II alleles can be reduced to four amino acid positions

Leprosy is a chronic disease caused by Mycobacterium leprae. Worldwide, more than 200,000 new patients are affected by leprosy annually, making it the second most common mycobacterial disease after tuberculosis. The MHC/HLA region has been consistently identified as carrying major leprosy susceptibility variants in different populations at times with inconsistent results. To establish the unambiguous molecular identity of classical HLA class I and class II leprosy susceptibility factors, we applied next-generation sequencing to genotype with high-resolution 11 HLA class I and class II genes in 1,155 individuals from a Vietnamese leprosy case-control sample. HLA alleles belonging to an extended haplotype from HLA-A to HLA-DPB1 were associated with risk to leprosy. This susceptibility signal could be reduced to the HLA-DRB1*10:01~ HLA-DQA1*01:05 alleles which were in complete linkage disequilibrium (LD). In addition, haplotypes containing HLA-DRB3~ HLA-DRB1*12:02 and HLA-C*07:06~ HLA-B*44:03~ HLA-DRB1*07:01 alleles were found as two independent protective factors for leprosy. Moreover, we replicated the previously associated HLA-DRB1*15:01 as leprosy risk factor and HLA-DRB1*04:05~HLA-DQA1*03:03 as protective alleles. When we narrowed the analysis to the single amino acid level, we found that the associations of the HLA alleles were largely captured by four independent amino acids at HLA-DRβ1 positions 57 (D) and 13 (F), HLA-B position 63 (E) and HLA-A position 19 (K). Hence, analyses at the amino acid level circumvented the ambiguity caused by strong LD of leprosy susceptibility HLA alleles and identified four distinct leprosy susceptibility factors.


Introduction
Leprosy is a chronic human disease of the skin and peripheral nerves that results from an infection with Mycobacterium leprae. Permanent nerve damage and disabilities can occur in leprosy patients, mainly due to delayed diagnoses and exacerbated inflammatory episodes known as type-1 reactions (T1R). Human genetic factors strongly influence susceptibility to leprosy and more than 30 loci throughout the genome have been associated with leprosy phenotypes (rev in [1]). Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) within the Major Histocompatibility Complex (MHC) on chromosome 6p21 as the most significantly leprosy-associated genetic variants [2][3][4][5][6][7]. Divided into three classes, the MHC region harbors hundreds of genes including the classical Human Leukocyte Antigen (HLA) genes of the MHC class I and class II regions. These genes encode transmembrane receptors that present short antigenic peptides to T-cells, and, in the case of class I molecules, to NK-cells and specialized cells of the monocyte lineage (rev in [8]). The highly polymorphic class I/II genes present several bi-allelic and multi-allelic amino acid variants that impact on the strength and specificity of peptide binding. Immunologically defined HLA alleles are, on the molecular level, multi-variant haplotypes of specific genes. Molecular defined HLA alleles are now generally used for HLA genetics studies since they provide a higher resolution over immunologically-defined alleles.
Historically, a large number of HLA class I and class II alleles have been associated with leprosy or its clinical subtypes in different ethnicities (rev in [9]). Association analyses of HLA genes are challenging due to the difficulty of high-resolution allele genotyping and the complex linkage disequilibrium (LD) pattern among HLA alleles. Since earlier studies were typically conducted employing low-resolution candidate gene approaches that did not allow to adjust on the impact of complex LD pattern across the HLA region, no unifying picture of class I and class II leprosy risk alleles emerged. More recently, using HLA allele imputation, the HLA-DRB1 � 15:01 allele was identified as the major driver of the MHC association signal for leprosy per se in the Chinese population [3,4]. In addition to HLA-DRB1, variants in HLA-DQA1 and HLA-C were also associated with leprosy in the Chinese population [10]. However, population-specific effects of HLA genes were suggested by two association scans conducted in Vietnamese and Indian populations that identified association signals that were independent of HLA-DRB1 alleles and implicated a class I HLA-C allele [11,12]. Hence, while the major genetic effects of leprosy susceptibility were associated with class II genes in Chinese leprosy patients it seemed possible that in Non-Chinese patients, class I alleles had a stronger effect on leprosy susceptibility.
In the present study, we used accurate HLA typing by next-generation sequencing (NGS) of three class I (HLA-A, HLA-C and HLA-B) and eight class II HLA genes (HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, HLA-DQA1, HLA-DQB1, HLA-DPA1 and HLA-DPB1) in a Vietnamese case-control sample. Gene-centric analyses revealed an intricate pattern of associated class I and class II alleles in line with the complex historical results. However, by taking into account allele correlations across the HLA region and by considering individual amino acid variants we showed that the complexity of associations could be reduced to the presence of specific amino acids at only four positions in the HLA-DRB1, HLA-B and HLA-A genes.

Ethics statement
The study received approval from the regulatory authorities of Ho Chi Minh City, Vietnam (So3813/UB-VX and 4933/UBND-VX) and the Research Ethics Board of the Research Institute at McGill University Health Centre in Montreal, Canada (REC98-041). Written informed consent was obtained from all participants in the study. For children, assent was given by subjects, and written informed consent was obtained from parents or guardians.

Population sample
For the present study, we enrolled 1,164 Vietnamese individuals recruited from Ho Chi Minh City in Vietnam, as previously described [7,13,14]. After quality control, 1,155 participants were included in the association study for leprosy per se, corresponding to 687 leprosy cases and 468 healthy controls (Table 1). Healthy controls were subjects living in the same districts as leprosy cases with no family history of leprosy up to third-degree relatives [7]. Due to low prevalence of leprosy, it is possible that the inclusion of genetically susceptible subjects into the group of controls will result in an underestimate of the strength of the genetic effect. The present WHO classification recognizes two leprosy subtypes: paucibacillary (PB) and multibacillary (MB) leprosy [15]. In the present dataset, 491 (71.5%) and 196 (28.5%) cases were classified as MB and PB leprosy respectively. All leprosy cases were included in the analysis of leprosy subtype using the WHO clinical classification as phenotype (Table 1). Finally, a subset of 460 borderline leprosy cases was selected for the analysis of T1R and was previously described [16]. This population sample included 230 T1R-free and 230 T1R-affected leprosy patients that were matched for gender, Ridley and Jopling subtypes and age at leprosy diagnosis (Table 1).

HLA typing by next-generation sequencing
In the present study we aimed to accurately type HLA alleles up to the second field resolution (HLA proteins). Hence, we conducted HLA typing by NGS in all 1,164 individuals for 11 HLA genes, including three class I genes-HLA-A, HLA-C and HLA-B-and eight loci from the class II region-HLA-DRB1/3/4/5, HLA-DQA1, HLA-DQB1, HLA-DPA1 and HLA-DPB1. We used Holotype HLA 24/11 and 96/11 kit v2 (Omixon Biocomputing Ltd, Budapest, Hungary) to amplify the genes and generate sequencing libraries [17]. The libraries were sequenced on a MiSeq platform (Illumina, San Diego, CA, USA) in a standard flow cell using MiSeq Reagent Kit v2, 500 cycle (Illumina). The sequencing data composed of 250 bp paired-end reads was analyzed for HLA typing using Omixon HLA Twin software v2.1.4 (Omixon) with default settings and using the IPD-IMGT/HLA release 3.29 reference [18,19]. For typing of HLA-DRB3/ 4/5, LD option was turned on for the software to use LD information in publicly available databases for determining if these loci were hemizygous. Precision of the employed method has been reported to range from 95.6% in HLA-DQB1 to 99.53% in HLA-A for four-digit alleles based on data from 424 samples originated from 197 reference cell lines [20]. In 253 samples, the NGS-based HLA genotype calls have been reported to be 97.4% concordant with the highresolution genotypes derived from a combination of Sanger-based typing and sequence-specific primer technology [17]. In our dataset, HLA typing failed in nine out of the 1,164 sequenced samples (0.77%) and those were excluded from the study. We obtained call-rates higher than 99% for HLA class I genes, while the call-rates for class II loci ranged from 90.6% in HLA-DQB1 to more than 98% in HLA-DRB1/3/4/5 genes. In total, we identified 104 and 252 alleles at the first (two-digit) and second field (four-digit) resolution, respectively.

HLA alleles and amino acid markers
For the association analyses, binary (biallelic) markers based on the presence (P) / absence (A) of each specific HLA allele or single amino acid were generated as described elsewhere [21]. These binary markers presented three possible genotypes: homozygous for the presence of the allele/amino acid (PP), heterozygous (AP) and homozygous for the absence of the allele/amino acid (AA). In the analysis of HLA alleles, binary markers were created for each two-digit and four-digit allele from the 11 HLA loci that were sequenced. In addition, we created binary markers for the presence/absence of HLA-DRB3/4/5 genes regardless of their alleles. Gene and allele markers for HLA-DRB3/4/5 were created using R software v3.4.4, while markers for the remaining HLA loci were generated using MakeReference command from SNP2HLA software v1.0.3 [21,22].

PLOS PATHOGENS
HLA amino acid variants and leprosy susceptibility HLA amino acid markers corresponded to the minor/major amino acids in biallelic protein positions (one marker per position) and binary markers for the presence/absence of each amino acid residue in multiallelic positions (three or more binary markers for single amino acid residues per multiallelic position). The amino acid markers were generated with MakeReference command from SNP2HLA software using an HLA protein reference panel with protein sequences from IPD-IMGT/HLA release 3.34 [19]. The reference panel was manually curated to include new HLA alleles detected by NGS-based typing and to complete gaps in a few known alleles from regions for which we had obtained sequencing data. The positions that we were unable to complete were coded as missing data in the MakeReference output file. A total of 491 single amino acid markers were created for the polymorphic positions in HLA-A, HLA-C, HLA-B, HLA-DRβ1 and HLA-DQα1 proteins.
HLA alleles or amino acids with call-rate lower than 90%, with minor allele frequency (MAF) lower than 1% in the 1,155 individuals or with deviations from Hardy-Weinberg Equilibrium in the control group with P < 0.01 were excluded from the analysis. In total, 198 binary markers for HLA alleles were used in the association analyses, including three markers for HLA-DRB3/4/5 genes, 72 two-digit alleles and 123 four-digit HLA alleles. In the amino acid analysis, 424 markers passed filtering and were included in the association analyses. The tested amino acid markers corresponded to 192 biallelic amino acid changes and 232 binary markers for single residues in 71 multiallelic protein positions. The genotypes of these alleles and amino acid binary markers are available in the S1 Data.

Statistical analyses
To account for the strong correlation caused by linkage disequilibrium (LD) among HLA alleles and among HLA amino acid markers, GEC (Genetic Type I Error Calculator) v0.2 was used to find the effective number of independent tests [23]. For that, the genomic positions of all markers were coded as being consecutives to avoid exclusion of markers that were at the same position. Although efficiency of programs for estimation of the effective number of independent tests has not been demonstrated specifically for HLA alleles/amino acids binary markers, we selected to use GEC as it has been shown to be robust to variable LD patterns across the genome [23]. Using GEC, the 198 HLA alleles markers were determined to be equivalent to 121.41 independent tests in the total population sample. Bonferroni correction for multiple testing was applied using the effective number of allele markers. This yielded a significance P cut-off of 4.12 × 10 -4 (0.05/121.41), which was used in the univariable association analysis of HLA alleles. The effective number of independent amino acid tests was determined as 178 using GEC software. Hence, P < 2.81 × 10 −4 (0.05/178) was used as multiple-testing cut-off in the univariable analysis of single amino acids. Univariable association analyses of HLA alleles or amino acids and leprosy susceptibility were done by logistic regression under an additive model as implemented in PLINK v1.9 [24]. For that, the three genotypes of the biallelic/binary markers (e.g. AA, AP and PP of an HLA allele marker) were coded as 0, 1 and 2 to reflect the minor allele dosage. In this model, the direction of the regression coefficient [presented as odds ratio (OR)] represented the effect of each additional minor allele. Forward conditional association analysis and pairwise reciprocal conditional analysis were also done in the additive model using PLINK, by the inclusion of the conditioned marker(s) into the model as covariate (s). In the conditional analyses, the OR conditional represented the effect of each additional minor allele of the tested marker whilst controlling for the covariate(s). P conditional < 0.01 was used as cut-off for independent association signals in the multivariable analyses. When the tested marker and a covariate presented collinearity due to complete LD, results were reported as not available (NA). Gender distribution between cases and controls was tested using Pearson's Chi-squared test with Yates' continuity correction. We included gender as a covariate in the univariable and multivariable association analyses for leprosy per se, since it is a known leprosy risk factor that was differently distributed between cases and controls (P Chisquare = 3.84 × 10 −7 , Table 1). To assess the distribution of HLA alleles among leprosy cases based on two disease endophenotypes, we used logistic regression under an additive model with no covariate included, where MB were compared to PB cases and T1R-affected patients were compared to T1R-free leprosy cases [15].
We used Akaike Information Criterion (AIC), a model selection method, to compare the allele-based and amino acid-based models and select the model that best represented the data [25]. AIC was calculated for a subset of 1,136 samples with no missing genotypes among six tag allele and amino acid markers (98% of the samples), using the LOGISTIC procedure in SAS v 9.4 (SAS Institute, Cary, North Carolina, USA). The model presenting the lowest AIC (AIC min ) was selected as presenting the best support among the tested models. Delta (Δ) AIC was calculated by subtracting the AIC of each model by the AIC min to compare the two models. Models with Δ AIC of 0-2, 3-9 and �10 were considered to present substantial support, considerably less support and no additional support regarding the best model, respectively [26].

Haplotype structure and LD
For LD analysis of associated markers, pairwise correlation coefficient r 2 and D' were calculated in healthy controls using PLINK v1.9. Phased haplotype between HLA amino acids from the same gene was determined by the NGS-based HLA typing method. For markers in different loci, we used Beagle v 3.0.4 to estimate the phased haplotypes [27]. OR and P-values for the haplotypic analyses were calculated using epitools R package v 0.5-10.1 [28]. Finally, Disentangler software was used for the haplotype structure visualization of the HLA genes presenting at least one four-digit allele associated with leprosy per se [29,30].

3D visualization of HLA molecules
To identify the location of the four key leprosy-associated HLA amino acids in the protein structure, the three-dimensional structure of the HLA molecules was analyzed using UCSF Chimera v 1.13.1. [31]. HLA-DR, HLA-B and HLA-A structures were based on Protein Data Bank (PDB): 3PDO, 1SYS and 3UTQ, respectively [32][33][34]. These entries correspond to HLA-DR1, HLA-B � 44:03 and HLA-A � 02:01 alleles respectively.

HLA allele association with leprosy
We conducted HLA genotyping by NGS of 11 HLA genes in 687 leprosy cases and 468 healthy controls from Vietnam (Table 1). In the first part of our study, 198 HLA alleles were tested for association with leprosy per se (S1 Table). We found 20 HLA alleles in the HLA-A, HLA-C, HLA-B, HLA-DRB3, HLA-DRB1, HLA-DQA1, HLA-DQB1 and HLA-DPB1 genes to be significantly associated with leprosy per se under an additive model ( Table 2). As previous studies had reported HLA alleles to be associated with leprosy clinical subtypes (rev in [9]), we analyzed the allele distribution between MB and PB leprosy patients and between T1R free and affected cases (Table 1). We observed no significant differences of HLA alleles for MB-PB leprosy clinical subtype or T1R state (S1 Table).
Among the leprosy per se associated HLA alleles, HLA-DQA1 � 01:05 presented the most significant evidence of association (OR = 3.11, P = 7.61 × 10 −10 ) followed by HLA-DRB1 � 10:01 (OR = 3.08, P = 1.02 × 10 −9 , Table 2). Of note, the HLA-DRB1 � 10:01 and HLA-DQA1 � 01:05 alleles were statistically equivalent due to complete LD (r 2 = 1). The small difference in their strength of association was due to differences in missing genotypes. To test for leprosy associations independent of HLA-DRB1 � 10:01~HLA-DQA1 � 01:05, we conducted a forward conditional association analysis of the 20 significantly associated alleles until markers presented P conditional � 0.01. When HLA-DQA1 � 01:05 entered as first independent genetic variable in the model, all HLA leprosy risk alleles lost significance (S2 Table). This indicated that the associations of risk alleles were almost entirely captured by HLA-DRB1 � 10:01~HLA-DQA1 � 01:05. We analyzed the haplotype structure of the four-digit HLA alleles and found that the HLA class I and class II leprosy risk alleles belonged to a long-range haplotype spanning from HLA-A to HLA-DPB1, with a frequency of 4.37% in cases and 1.5% in controls (haplotype LA-DQB1 � 05:01~HLA-DPB1 � 104:01, Fig 1 and S1 Fig).
After adjusting on the effect of HLA-DQA1 � 01:05 in the forward conditional analysis, significant associations were still observed for the alleles associated with protection from leprosy and the lowest P-value was found for the protective class I allele HLA-C � 07:06 (OR conditional = 0.33, P conditional = 2.35 × 10 −4 , S2 Table). When HLA-C � 07:06 entered in the forward conditional analysis, HLA-B � 44:03 and HLA-B � 44 lost significance while a residual effect was still observed for the remaining protective alleles (S2 Table). HLA-B � 44:03 and HLA-C � 07:06 were in strong LD (r 2 = 0.97) and statistically equivalent in pairwise multivariate analysis (S3 Table). Hence, the second independent association signal was due to HLA-C � 07:06~HLA-B � 44:03

HLA amino acid associations with leprosy
To disentangle the HLA association signals and assess the contribution of individual HLA amino acid polymorphisms, we tested if specific amino acid substitutions in HLA proteins, rather than the multi-amino acid HLA alleles, were associated with leprosy. A total of 424 single amino acids belonging to the HLA-A, HLA-C, HLA-B, HLA-DRβ1 and HLA-DQα1 proteins were selected for this analysis (S4 Table). In univariable analyses, 64 amino acids at 55 protein positions were significantly associated with leprosy per se (Fig 2A and S4 Table). The presence of aspartic acid at HLA-DRβ1 position 57 (HLA-DRβ1 57D, OR = 1.81, P = 1.81 × 10 −10 ) was the most significant association (Fig 2A). In pairwise reciprocal conditional analysis for HLA-DRβ1 57D versus each of the remaining 63 amino acids, a significant residual effect was consistently detected for HLA-DRβ1 57D (P conditional � 5.47 × 10 -3 ). This observation indicated a unique contribution of this amino acid to leprosy risk.
To test for leprosy associations independent of HLA-DRβ1 57D among the amino acids with study-wide significance, we used forward conditional regression until amino acids reached P conditional � 0.01 (Fig 2B-2D). When we adjusted on HLA-DRβ1 57D, the strongest remaining risk signal was the presence of phenylalanine at HLA-DRβ1 position 13 (HLA-DRβ1 13F, OR conditional = 1.69, P conditional = 6.49 × 10 −6 , Fig 2B). However, HLA-DRβ1 13F and phenylalanine at HLA-DRβ1 position 31 (HLA-DRβ1 31F) were in complete LD (r 2 = 1) since presence of HLA-DRβ1 13F was always in haplotype with absence of HLA-DRβ1 31F  (Table 2 and S1 Table). HLA alleles that belong to the same association signal based on the forward and pairwise conditional analyses are connected by lines (S2 and S3 Tables). The signals indicated common haplotypes of the corresponding alleles (S1 Fig). https://doi.org/10.1371/journal.ppat.1008818.g001

PLOS PATHOGENS
HLA amino acid variants and leprosy susceptibility  Table), B) conditioned on HLA-DRβ1 57D (forward step 1), C) conditioned on HLA-DRβ1 57D and HLA-DRβ1 13F (forward step 2) and D) conditioned on HLA-DRβ1 57D, HLA-DRβ1 13F and HLA-B 63E (forward step 3). A schematic representation of the MHC/HLA region highlighting the genes sequenced in the present study is shown on top. In the graphs, the y axes indicate the negative log10 of the P-value for association of amino acids with leprosy per se and the x axes correspond to the amino acid position in the mature HLA proteins. The results for each HLA gene are shown by column and the dots represent single amino acids. The HLA loci and amino acid markers are ordered based on their genomic positions. The red line corresponds to the significance P-value threshold for multiple-testing of amino acid markers (P = 2.81 × 10 −4 ) in the univariable analysis (A) and the blue lines represent a P conditional of 0.01 in the multivariable analyses (B-D). Dots shown in black reached the multiple testing P-value threshold in the univariable analysis while grey dots were not significant (A). Only amino acids significantly associated in the univariable step (A) were included in the and vice-versa. As a consequence of the inverse relationship between phenylalanines at positions 13 and 31, presence of 31F was associated with protection from leprosy (OR conditional = 0.59, P conditional = 6.49 × 10 −6 ). When we adjusted for risk HLA-DRβ1 57D and HLA-DRβ1 13F, all remaining class II amino acid markers in HLA-DRβ1 and HLA-DQα1 lost evidence of association ( Fig 2C). Conversely, significant residual associations were still observed for HLA-C and HLA-B amino acids. At this step, glutamic acid at HLA-B position 63 displayed the lowest P-value (HLA-B 63E, OR conditional = 0.72, P conditional = 2.50 × 10 −4 ). Inclusion of HLA-DRβ1 57D, HLA-DRβ1 13F and HLA-B 63E in the third step of the forward conditional analysis accounted for the full association signal of the remaining 61 HLA amino acid markers at P conditional � 0.01 (Fig 2D).

Comparison of HLA allele-based and amino acid-based models
Next, we tested to what extent the amino acids captured the leprosy association of the HLA alleles. Except for HLA-A � 11:02, the association signals of the study-wide significant and borderline-significant HLA alleles were accounted for when the analysis was adjusted on the HLA-DRβ1 57D, HLA-DRβ1 13F and HLA-B 63E amino acids (S5 and S6 Tables). However, to capture the association of HLA-A � 11:02 it was necessary to include HLA-A 19K in the conditional analysis (S5 Table). HLA-A 19K is located outside the protein-binding groove (S2C Fig) and is found exclusively in the HLA-A � 11:02 allele. Taking together, these results suggested that the four amino acids offered a good explanation for the observed HLA allele associations. To further validate this conclusion, we used the AIC to select the best multivariate model from the HLA allele and amino acid analyses. The four-amino acid model with gender, HLA-DRβ1 57D, HLA-DRβ1 13F, HLA-B 63E and HLA-A 19K presented the lowest AIC among the tested models (model A in Table 3). Conversely, the models including the three significant amino acids (gender + HLA-DRβ1 57D + HLA-DRβ1 13F + HLA-B 63E, model C) and the three significant HLA alleles (gender + HLA-DQA1 � 01:05 + HLA-C � 07:06 + HLA-DRB1 � 12, model D) presented considerably less support (Δ AIC of 5 and 12, respectively; Table 3). Indeed, even after inclusion of HLA-A � 11:02 in the four-allele model (model B), the multivariable analyses (B-D). The most significant association signal at each step is indicated by the name of the amino acid(s) shown in blue. Borderline-associated HLA-A 19K is presented in black in step (A). The locations of the markers used for stepwise conditioning are indicated by red arrows. https://doi.org/10.1371/journal.ppat.1008818.g002

PLOS PATHOGENS
HLA amino acid variants and leprosy susceptibility four-amino acid model still provided the best explanation of the data (Δ AIC = 4, Table 3). Hence, the four-amino acid model was the model best fitting the data.

Haplotype analysis of HLA amino acids
To better understand how the HLA-DRβ1 amino acid residues contributed to the association of the HLA-DRB1 alleles, we analyzed the phased haplotype between HLA-DRβ1 13F and 57D based on the protein sequence of the four-digit HLA alleles (Fig 3A). HLA-DRβ1 position 13 and 57 were multi-allelic in the studied population with five and three additional amino acids besides 13F and 57D, respectively. Interestingly, the haplotype of the two risk residues was only present on the risk allele HLA-DRB1 � 10:01 and two rare alleles, HLA-DRB1 � 01:01 and HLA-DRB1 � 01:02 (HLA-DRβ1 13F present~57D present haplotype in Fig 3A). Due to their low frequency, the HLA-DRB1 � 01 alleles were not included in the association analyses (see Methods). However, we noted that the HLA-DRB1 � 01 group was more frequent in leprosy cases than in healthy controls (frequency of 0.7% and 0.3% respectively). On the other hand, both risk amino acids were absent in 13 HLA-DRB1 alleles including the protective HLA-DRB1 � 12 (HLA-DRβ1 13F absent~57D absent haplotype in Fig 3A). This suggested the haplotypes HLA-DRβ1 13F present~57D present and HLA-DRβ1 13F absent~57D absent as main causes of the risk and protective effects of leprosy associated HLA-DRB1 alleles.
In the HLA-B protein, position 63 was biallelic in our dataset with the risk asparagine (HLA-B 63N) as the major allele among cases and the protective glutamic acid (HLA-B 63E) as the major allele among controls (HLA-B 63E had frequencies of 46.3% in cases and 52.1% in controls). We analyzed the haplotypic effect between the amino acid at HLA-B position 63 with the major risk HLA-DRβ1 13F present~57D present and the major protective HLA-DRβ1 13F absent~57D absent haplotypes. For that, we estimated the phased haplotype between HLA-B and HLA-DRB1. We observed that the strength of association of the protective HLA-DRβ1 13F absent~57D absent was increased by the presence of the protective HLA-B 63E on the same haplotype, while HLA-DRβ1 13F present~57D present had a stronger risk effect when on the same haplotype with HLA- B 63N (Fig 3B).

PLOS PATHOGENS
HLA amino acid variants and leprosy susceptibility by reciprocal conditional analysis we found that the HLA-DRB1 � 07:01 suggestive signal was dependent on the significant HLA-B � 44:03 allele (S3 Table) and was included in signal #2 (Fig  1). Finally, haplotypes including the protective HLA-DRB1 � 12:02 allele from signal #3 in the protective HLA-B 63E~HLA-DRβ1 13F absent~57D absent haplotype group consisted of different less common and non-significant HLA-B alleles with glutamic acid at position 63 ( Fig  3B). These results indicated that the class I HLA-B amino acid change contributed to the association signal of the class II variants and illustrates how protective and susceptibility effects of classical HLA alleles are dependent on the presence/absence of specific amino acids. In the left and middle columns, the HLA alleles and amino acids with P < 0.01 are colored in orange or green if their presence corresponded to risk or protection from leprosy, respectively (S1 and S4 Tables). Markers with multiple-testing significance in the univariable analysis are indicated by a darker orange/green. In the right column, HLA alleles are grouped based on the haplotype of the presence/absence of HLA-DRβ1 13F and 57D. B) Contribution of the HLA-B 63N/E biallelic amino acid polymorphism to the association of HLA-DRβ1 13F present~57D present and HLA-DRβ1 13F absent~57D absent with leprosy. The results of the four estimated phased haplotypes are presented by columns. Odds ratios (OR), 95% confidence intervals (95% CI) and P Chi-square were determined for the four haplotypes using the pool of remaining haplotypes as reference (44.7% in cases and 40.4% in controls), except haplotypes with missing data (0.7% in cases and 0.3% in controls). Four-digit HLA-B~HLA-DRB1 haplotypes belonging to each group and presenting counts higher or equal to ten chromosomes in cases and/or controls are listed at the bottom. HLA alleles and single amino acids that as binary markers presented P < 0.01 in the genotypic univariable analysis are colored in red and green when associated with risk or protection from leprosy, respectively. Markers that reached the multiple-testing thresholds are indicated in bold (S1 and S4 Tables). https://doi.org/10.1371/journal.ppat.1008818.g003

PLOS PATHOGENS
HLA amino acid variants and leprosy susceptibility

Discussion
The MHC region has been consistently identified as chromosomal location of strong genetic leprosy risk factors, including in our recent Vietnamese GWAS where three independent association signals were identified in the MHC region [7]. While genetic studies have implicated non-classical MHC genes like LTA and MICA in leprosy susceptibility, the main effects have been assigned to classical class I and class II HLA genes [11,35]. Employing candidate gene approaches, these genes have long been the object of genetic studies of leprosy, mainly trying to identify clinical subtype specific effects (rev in [9]). However, the extreme polymorphic nature of class I and class II HLA genes combined with extensive, population-specific LD have provided a major challenge for the replication of reported allele associations across studies. HLA imputation using GWAS data has been demonstrated to be an effective method for MHC fine mapping. However, this approach also presents some limitations. Imputation accuracy of low frequency HLA alleles is dependent on the similarity of the ethnicities between the pre-built reference panel and the studied population sample since alleles not present in the panel cannot be imputed. HLA imputation is limited to the genes present in the reference panel. Moreover, HLA allele resolution of the imputed data is defined by the resolution of the HLA typing method used to generate the reference panel. Hence, in the present study, we elected to perform high-resolution HLA typing by NGS to obtain accurate HLA genotypes (low ambiguity rate) independently to our GWAS data and pre-build reference panels. The use of NGS to genotype classical HLA genes in the Vietnamese population allowed us to gain a comprehensive understanding of the alleles implicated in leprosy susceptibility. Based on HLA allele analysis, we detected independent signals of association with risk and protection from leprosy of class I and class II alleles (Fig 1). The strongest leprosy susceptibility factors in our study were the HLA-DRB1 � 10:01 and HLA-DQA1 � 01:05 alleles which are in complete LD in the Vietnamese population. These two alleles are tagging an extended haplotype which spans from HLA-A to HLA-DPB1, a distance of approximately 3.15 Mb (signal #1 in Fig 1). These findings are strongly supported by a study of three Indian leprosy cohorts and our family-based leprosy GWAS in Vietnam [5,7]. In the Indian population samples, the authors reported a strong risk effect for leprosy of MHC class II SNP rs1071630 in two case-control samples from New Delhi and Kolkata [5]. A significant but weaker effect was seen in a family-based population sample from Southern India for the same SNP [5]. In the Vietnamese leprosy GWAS, SNP rs3187964 was identified as the most significant independent signal in the MHC region [7]. Employing a subset of imputed data, we found that both rs1071630 and rs3187964 were in complete LD with HLA-DQA1 � 01 in our sample (r 2 = 1 and r 2 = 0.98 respectively). HLA-DQA1 � 01 was significantly associated with leprosy and presented frequencies of 43.1% and 32.8% in cases and controls, respectively (OR = 1.56, P = 8.34 × 10 −7 , Table 2). HLA-DQA1 � 01 is composed of five four-digit alleles that share specific amino acid residues ( � 01:01 to � 01:05), including the major risk allele HLA-DQA1 � 01:05 (Table 2). Conditioning on HLA-DQA1 � 01 did not abolish the association of HLA-DQA1 � 01:05 (P conditional = 1.25 × 10 −6 ). Conversely, conditioning on HLA-DQA1 � 01:05 accounted for most of the signal of HLA-DQA1 � 01 (P conditional = 0.0103). Together, these data showed that rs1071630 and rs3187964 are tag SNPs for the major risk signal in our sample.
Similarly, we had shown SNP rs2394885 to be a major leprosy risk factor in two large samples from Vietnam and India [12]. We had also shown that this SNP tags the HLA-C � 15:05 allele. Here, we replicated this previous observation (r 2 = 0.63) and found the same SNP also to be in strong LD with HLA-B � 07:05 (r 2 = 0.63) and HLA-A � 29:01 (r 2 = 0.58). In the Vietnamese leprosy GWAS, we found that rs2394885 belonged to a large bin of SNPs in strong LD that included the most significant independent class I region SNP rs114598080 of the GWAS [7].
All three tagged class I alleles (HLA-C � 15:05, HLA-B � 07:05, HLA-A � 29:01) are part of signal #1 in our study (Fig 1). Taken together the combined results of the present and previous studies strongly support the role of HLA-DRB1 � 10:01~HLA-DQA1 � 01:05 as major leprosy risk factor in independent Indian and Vietnamese samples.
The strongest protective signal was associated with the HLA-C � 07:06/HLA-B � 44:03 alleles. The two alleles were mostly found on a haplotype with HLA-DRB1 � 07:01 (D' = 0.77), and collectively made up signal #2 in our study (Fig 1 and S1 Fig). The univariable borderline association of HLA-DRB1 � 07:01 with leprosy was fully explained by HLA-C � 07:06/HLA-B � 44:03 by multivariable analysis. Hence, signal #2 is consistent with previous results in a large case-control sample from Brazil that did not test class I alleles but detected HLA-DRB1 � 07 as protective factor [38]. The second resistance signal (signal #3) in our study was associated with allele HLA-DRB1 � 12:02. This finding was consistent with the reported protective effect of HLA-DRB1 � 12 in Brazilian and Indonesian leprosy patients [38,39]. In large studies in Chinese patients, the strongest protective effect was assigned to HLA-DQB1 � 04:01 [3]. In both the Vietnamese and the Chinese populations, HLA-DQB1 � 04:01 is part of a haplotype which includes HLA-DQA1 � 03:03 and HLA-DRB1 � 04:05 (S1 Fig), and in a meta-analysis of studies in Chinese patients HLA-DQA1 � 03:03 was identified as main leprosy resistance factor [10]. While in our study both HLA-DQA1 � 03:03 and HLA-DRB1 � 04:05 did not pass the significance threshold required by multiple testing, we did observe a strong protective effect for both alleles independently of the three significant HLA allele signals (OR conditional = 0.50, P conditional = 2.65 × 10 −3 and OR conditional = 0.49, P conditional = 3.02 × 10 −3 , respectively) that can be considered as replication of the Chinese studies (S5 Table). Hence, while significance of the protective effect differed among Chinese and Vietnamese patients the effect sizes obtained in Vietnamese patients were consistent with those obtained for Chinese patients.
The leprosy protective signal #2 is dominated by alleles that are part of a HLA-C � 07:06H LA-B � 44:03~HLA-DRB1 � 07:01 haplotype (Fig 1). Of note, HLA-DRB1 � 07:01 had previously been implicated in two main inflammatory bowel diseases: Crohn's disease (CD) and ulcerative colitis (UC). Interestingly, HLA-DRB1 � 07:01 was protective for UC while it was a risk marker for CD [40]. A meta-analysis of HLA-DRB1 alleles and risk of tuberculosis (TB) in Asian patients found HLA-DRB1 � 07:01 associated with protection from TB [41]. The same meta-analysis also identified HLA-DRB1 � 12:02 (protective signal #3 in Fig 1) as TB protective marker with borderline significance [41]. In addition, HLA-DRB1 � 12 and � 12:02 had also been associated with protection from periodontal oral infections and recurrent typhoid fever [42,43]. An allele that needs to be considered is HLA-DRB1 � 04:05 which was associated with protection from leprosy in the present as well as other studies [38,44,45]. Previously, HLA-DRB1 � 04:05 had been found protective for enteric fever in Vietnamese patients [46]. In addition, HLA-DRB1 � 04:05 had been reported as associated with protection from UC and risk for CD in the Japanese population [47]. Finally, the leprosy risk allele HLA-DRB1 � 15:01 was a major risk factor for multiple sclerosis in earlier studies and has been associated with risk for UC and Parkinson's disease [40,[48][49][50]. The combined results showed that HLA alleles implicated in leprosy susceptibility have a wider impact on infectious and inflammatory diseases. Importantly, these results also further support the genetic overlap between leprosy, inflammatory bowel disease and Parkinson's disease [1,16,51].
Our study demonstrated the challenges posed by the very strong LD across the HLA complex. While independent risk and protective signals could be defined, it was in general not possible by genetic means to implicate a single class I/II allele as cause of the observed associations. To disentangle the complex pattern of correlated alleles, we investigated the association of single polymorphic amino acids within class I and class II proteins. We identified four amino acids in HLA-DRβ1 (phenylalanine and aspartic acid in positions 13 and 57), HLA-B (glutamic acid in position 63), and HLA-A (lysine in position 19) that fully explained the associations of class I and class II alleles with leprosy in our sample (S5 and S6 Tables). Model selection metrics, such as AIC, have been useful to identify the combination of independent HLA variants that represent the best-fitting model for the data in previous studies [52,53]. Hence, we used AIC to compare the allele-based and amino acid-based models in our dataset. We showed that the model based on only four amino acids better explained the HLA effect on leprosy than the model based on HLA alleles that tag a number of highly correlated alleles ( Table 3). Three of the four amino acids are located within the peptide-binding groove of the HLA-DRβ1 and HLA-B proteins (S2 Fig). Amino acid changes in the peptide-binding groove can impact on the HLA molecule interaction with the antigen peptide. In the HLA-DR molecule, amino acid changes at the two HLA-DRβ1 polymorphic positions 57 and 13 are predicted to impact on the peptide-binding repertoire size [54]. Moreover, aspartic acid at position 57 of HLA-DRβ1 interacts with a conserved residue in HLA-DRα and this interaction appears lost in the absence of aspartic acid [55]. Interestingly, this HLA-DR interaction is the same as in HLA-DQ for the well-known Type 1 diabetes-protective HLA-DQβ1 57D [56][57][58]. In HLA-B, an in silico analysis has indicated that a single substitution in position 63 of the protein can have a strong impact on the protein repertoire that bind the molecule, suggesting this position as crucial for peptide binding [59]. Taken together, these findings suggest that the strong impact on leprosy susceptibility found in our study can be traced to changes in the HLA molecule structure as well as to specific HLA molecule-antigen peptide interactions. Additional studies will be required to fully comprehend the exact range of HLA-M. leprae antigen interactions as well as to investigate the non-additive and interactive risk effect of HLA variants in leprosy. Functional studies analyzing the impact of HLA-DRβ1 57D and 13F, HLA-B 63E and HLA-A 19K in the context of the specific protein structure of the associated HLA alleles will provide additional insight in the role of HLA molecules in leprosy susceptibility. Given the ongoing efforts of developing a leprosy vaccine, it will be useful to assure that vaccine antigens can overcome this specific genetic restriction of leprosy susceptibility.
Supporting information S1 Fig. Haplotype structure of the four-digit alleles for class I HLA-A, -C and -B and class  II HLA-DRB1, -DQA1, -DQB1 and -DPB1 in the Vietnamese sample. The two panels show the HLA haplotype structure in A) healthy controls (N = 468) and B) leprosy cases (N = 687). The columns represent observed alleles of seven HLA class I and class II genes, where each box corresponds to a specific four-digit HLA allele. The height of the box is relative to the observed HLA allele frequency. Alleles associated with risk or protection from leprosy per se are shown in black and green boxes, respectively (see Fig 1). Each grey line connects two HLA alleles in consecutives HLA genes, where the thickness of the line is based on the frequency of the haplotype. Haplotypes between associated alleles in consecutives genes are highlighted in darker grey.  1 and 2, respectively). Genotypes are provided in PLINK binary format (bed/bim/fam files) from PLINK v1.9 (http://www.cog-genomics.org/ plink/1.9/). (GZ)