Deciphering the genetic control of gene expression following Mycobacterium leprae antigen stimulation

Leprosy is a human infectious disease caused by Mycobacterium leprae. A strong host genetic contribution to leprosy susceptibility is well established. However, the modulation of the transcriptional response to infection and the mechanism(s) of disease control are poorly understood. To address this gap in knowledge of leprosy pathogenicity, we conducted a genome-wide search for expression quantitative trait loci (eQTL) that are associated with transcript variation before and after stimulation with M. leprae sonicate in whole blood cells. We show that M. leprae antigen stimulation mainly triggered the upregulation of immune related genes and that a substantial proportion of the differential gene expression is genetically controlled. Indeed, using stringent criteria, we identified 318 genes displaying cis-eQTL at an FDR of 0.01, including 66 genes displaying response-eQTL (reQTL), i.e. cis-eQTL that showed significant evidence for interaction with the M. leprae stimulus. Such reQTL correspond to regulatory variations that affect the interaction between human whole blood cells and M. leprae sonicate and, thus, likely between the human host and M. leprae bacilli. We found that reQTL were significantly enriched among binding sites of transcription factors that are activated in response to infection, and that they were enriched among single nucleotide polymorphisms (SNPs) associated with susceptibility to leprosy per se and Type-I Reaction, and seven of them have been targeted by recent positive selection. Our study suggested that natural selection shaped our genomic diversity to face pathogen exposure including M. leprae infection.


Introduction
Leprosy is a human infectious disease caused by Mycobacterium leprae. Although curable, leprosy remains a major public health problem in sub-national regions of endemic countries [1]. The extremely low strain variability of M. leprae makes it a perfect model to access inter-individual differences in host responses to the bacillus since all leprosy patients are infected by virtually the same strain. Indeed, a strong host genetic contribution to leprosy susceptibility has been well established through the identification of leprosy susceptibility genes by both positional cloning (PARK2, LTA, HLA-C, MRC1) and candidate gene approaches (e.g. TLR1, TNF, CUBN and NEBL) [2][3][4][5][6][7][8]. Independently, genome-wide association studies (GWAS) allowed the identification of genes and pathways playing a crucial role in leprosy susceptibility such as genes of the HLA system and genes in the TNF pathway [9][10][11]. Most of the initially identified associations were replicated by subsequent studies [12][13][14][15][16]. Interestingly, two of the loci tagged by GWAS-TNFSF8/TNFSF15, LRRK2 -were later identified as risk factors for Type-I Reaction (T1R) which is the most frequent type of leprosy reaction, often leading to permanent disability [17][18][19]. Independently, a GWAS in Vietnamese and Brazilian populations identified a lncRNA as T1R risk factor [20]. Despite these advances in our understanding of the host genetic contribution to leprosy susceptibility, the modulation of the transcriptional response to infection and the mechanism of disease control are still poorly understood.
Immune responses against a particular infectious agent vary considerably between individuals and populations. Although a substantial proportion of these differences can be attributed to the environment, the contribution of host genetic factors is increasingly documented [21,22]. Indeed, the contribution of host genetics to inter-individual differences in innate immune responsiveness has been recently demonstrated using expression quantitative trait loci (eQTL) mapping. This approach allows to identify associations between genotypes and variation of gene expression levels at baseline and in cells exposed to different immune stimuli or live infectious agents [21][22][23][24][25][26]. Employing human pathogens as trigger, these eQTL studies have identified a large number of host genetic variants that underlie variation in the host innate immune response to infection, including eQTL who can only be detected in either stimulated or non-stimulated conditions (i.e., response-cis-eQTL (reQTL)). Such reQTL reflect the direct interaction of genetic control elements with the pathogen and provide a model for gene-environment interactions [21][22][23][24][25][26].
Here, we used this approach to identify genetic variants that contribute to inter-individual variation in immune responses to M. leprae sonicate. To maximize the power for detection of eQTL, we enrolled subjects of a single ethnicity (Kinh) and susceptible background (leprosy patients) to decrease inter-individual variability. Since the genuine immune cell(s) involved in the host response against M. leprae infection are still unknown, we decided to focus on whole blood, which is a robust tissue for eQTL analysis [27][28][29]. In addition, the multitude of genes and pathways that play a role in leprosy susceptibility strongly suggested the involvement of multiple cell types further supporting the tissue approach to eQTL analysis [30]. By stimulating whole blood with soluble M. leprae antigen, we identified 318 genes harbouring cis-eQTL including 66 with reQTL. Hence, we provide a comprehensive scheme of the human host gene cis-regulation mechanisms in whole blood cells in response to M. leprae sonicate. To facilitate the use of our data by the research community, we integrated our results in a publicly available browser (http://immunpop.com/manry/eQTL), developed by [21].

Immune response processes are enriched among upregulated genes
We stimulated whole-blood cells from 51 Vietnamese subjects with M. leprae sonicate for 26-32 hours. We then extracted RNA from untreated and treated samples and characterized the genome-wide expression profiles in all samples leading to the analysis of 12,043 genes. Using a t-test, we found 6,675 differentially expressed genes after Bonferroni correction (P < 4.2x10 -6 ), 2,338 being up-regulated, 4,337 being down-regulated after stimulation. Of note, 1,858 of those genes displayed an absolute log 2 fold-change (log 2 |FC|) > 0.5 ( Fig 1A). Gene ontology enrichment analysis confirmed that after stimulation the majority of up-regulated genes (74.5%) were related to immunity-related gene ontology (GO) terms such as "inflammatory response", "response to molecule of bacterial origin" (false discovery rate (FDR) < 10 −26 ). We found 177 GO terms with an FDR < 10 −6 , most of them (85.3%) were immune-related GO terms (Fig 1B, S1 Table). Prominently, among the upregulated genes were the Mendelian Susceptibility to Mycobacterial Disease genes of the IFNG pathway, including IFNG, IFNGR2, STAT1, IRF8 and IL12B ("interferon-gamma-mediated signaling pathway", FDR < 4.5x10 -7 ). Conversely, there was no observed GO term enrichment with an FDR < 10 −6 for down-regulated genes. Therefore, stimulation with M. leprae sonicate induced strong changes in the gene expression profile of whole blood cells, mainly dominated by the overexpression of genes involved in immune system processes and immune related functions.

Identification and characterization of expression quantitative trait loci
After having identified differentially expressed genes, our goal was to identify genetic variants impacting gene expression levels. Following genome-wide genotyping and high quality imputation of 4,348,666 variants, we selected variants with a frequency higher than 10% in our sample that were located within a 200kb window of gene transcription start sites (TSS). Using these criteria, a total of 1,722,978 variants were tested for association with gene expression levels. To identify cis-eQTL, we used a linear regression model to assess association between the expression level of 12,043 autosomal genes and SNP genotypes independently in stimulated and non-stimulated cells. We identified a total of 318 genes displaying cis-eQTL at an FDR of 0.01 in either stimulated or non-stimulated samples (S1 Fig and S2 Table). Relaxing the FDR to 0.05 allowed us to identify 546 genes with cis-eQTL (S3 Table).
In the context of the cellular response against M. leprae sonicate, cis-eQTL that are present only before or only after stimulation have a more direct role in modulating the host response against M. leprae because they are impacted by the mycobacterial sonicate. To determine the proportion of eQTL specific to each condition, we used a continuous measure π 1 [31], which allowed us to estimate the proportion of eQTL that are shared in stimulated cells and nonstimulated cells. When considering only the best eQTL signal per gene (the variant with the lowest P value, S2 Table), we obtained π 1 = 0.905 and 0.935 for stimulated and non-stimulated cells, respectively, suggesting that although most eQTL are shared, a non-negligible fraction are only detected in stimulated or non-stimulated cells. To identify which variants are responsible for such specific eQTL, we searched for "response cis-eQTL" (reQTL) as defined by Barreiro et al., using stringent criteria to minimize the probability of false positives (see Material & Methods) [23]. We found a total of 66 genes displaying reQTL. A total of 20 genes were associated with an eQTL only in stimulated cells and 46 only in non-stimulated cells (Figs 2 and 3, S4 Table). Among them, ADCY3 is among the most upregulated genes after stimulation with M. leprae antigens and has been identified as part of the T1R gene set signature identified by Orlova et al. [32] (Fig 3A). To assess whether the reQTL identified here were preferentially located in transcription factor binding sites, we used the annotation provided by HaploReg v4 Black dots correspond to genes that are not differentially expressed, grey dots correspond to differentially expressed genes with a log 2 FC < |0.5|, blue dots correspond to differentially expressed genes with a log 2 |FC| > 0.5. (B) Gene ontology enrichment analysis for up-regulated genes (FDR q-value < 10 −6 ) with a log 2 FC > 0.5. Each dot corresponds to a gene ontology (GO) term. For example, the observed number of up-regulated genes belonging to the GO term "immune system process" is compared to the expected number of genes belonging to this same GO term among the genes which expression has been successfully measured. Only significant enrichments at an FDR q-value < 10 −6 are included. Darker dots correspond to larger -log 10 (FDR q-values). The 10 GO terms displaying the largest -log 10 (FDR q-values) and the 5 GO terms displaying the highest enrichments are labelled. Of note, there is no gene ontology enrichment regarding down-regulated genes at this threshold. https://doi.org/10.1371/journal.pgen.1006952.g001 Genetic control of leprosy host response and resampled the position of all reQTL across the tested regions (200kb around TSS). We found a significant enrichment of reQTL located in binding sites for 54 transcription factors (Resampling P values between 5x10 -6 and 0.042 and a fold-change ranging from 1.66 to 4.64). Four transcription factors tagged by this analysis are involved in immune-mediated diseases such as systemic lupus erythematosus, inflammatory bowel disease and multiple sclerosis (CFOS, ELF1, ETS1 and NFKB; S5 Table). In addition, we evaluated the effect of reQTL on TFBS using the SNP2TFBS tool (http://ccg.vital-it.ch/snp2tfbs) [33], which predicts a change in TF binding based on position weight matrices (PWM). This approach allowed us to identify seven TF (Mafb, E2F4, Zfx, E2F1, PPARG_RXRA, ELK1 and NR3C1) for which binding sites altered by a reQTL were significantly enriched above genome-wide expectation (S6 Table). Among those, NR3C1 (also know as the glucocorticoid receptor GR) is involved in inflammatory responses [34]. Of note, E2F1 and NR3C1 binding sites were enriched among reQTL in both methods.
Leprosy per se and Type-I Reaction GWAS hits are enriched for reQTL Our group recently conducted two GWA scans for leprosy per se (Cobat et al, in preparation) and leprosy T1R [20] in the Vietnamese population. To investigate a possible contribution of cis-eQTL to the genetic control of these phenotypes, we searched for an enrichment of cis-eQTL displaying GWAS P values below 0.05. When focusing on association under the best genetic model for each SNP (recessive, additive or dominant), the leprosy per se and the T1R GWAS exhibited 8.2% (951,685 SNP out of a total of 11,614,395 tested SNPs) and 7.7% (876,126 SNP out of a total of 11,342,181 tested SNPs) of SNPs with P value < 0.05, respectively. An enrichment of cis-eQTL among those nominally associated SNPs would be indicative of a role for the genetic control of transcript levels in both leprosy and T1R. To avoid any bias due to linkage disequilibrium between eQTL for each gene, we only considered the "best" eQTL per gene, i.e. the one displaying the lowest P value. We thus extracted the GWAS P values for each "best" cis-eQTL from both GWAS studies and identified the proportions of cis-eQTL with GWAS P values < 0.05. Using such a stringent criterion, a total of 9.1% of cis-eQTL (1.1-fold enrichment, resampling P value = 0.24) displayed a P value below 0.05 for association with leprosy per se. However, 10.4% of cis-eQTL (1.3-fold enrichment, resampling P value = 0.03) met this condition for association with T1R. Likewise, reQTL were enriched among SNPs associated with low GWAS-P values for both GWAS (1.7-fold enrichment, P value = 0.04 for leprosy per se and 2.4-fold enrichment, P value = 0.001 for T1R). To confirm these results considering all the cis-eQTL and reQTL, we used the GARFIELD software [35], which integrates the LD structure between each eQTL or reQTL, their frequencies, and their distance to the nearest TSS. Under these more stringent conditions, we observed a trend of association for reQTL among T1R and leprosy per se GWAS hits (P = 0.07 and P = 0.11, respectively). Hence, cis-eQTL identified in The gene identity is indicated above each pair of graphs. The gene expression level in log 2 scale (y-axis) is plotted for each genotype (x-axis). Of note, reQTL for the ADCY3 and DNAAF1 genes have been found by other studies using distinct pathogens or molecules as stimuli, while the reQTL for ZNF517 is a newly identified reQTL [21,22,24,26]. ADCY3 is among the most upregulated genes after stimulation with M. leprae antigens and has been identified as part of the T1R gene set signature identified by Orlova et al. [32]. The reQTL for DNAAF1 displays the strongest P value among the reQTL we identified.
https://doi.org/10.1371/journal.pgen.1006952.g003 our study are candidates for affecting susceptibility to T1R, while reQTL are candidates for affecting both susceptibility to leprosy per se and T1R.

Recent positive selection targeted reQTL
Since eQTL have been shown to be targeted by recent selection in humans, we sought to determine the extent to which natural selection targeted cis-eQTL and reQTL detected in our study [36]. As the individuals recruited for our study were all Kinh in Ho Chi Minh City, Vietnam (KHV) people, we used whole-genome sequences of the 1,000 Genomes Project Phase III of the same KHV population (101 samples) [37]. All samples clustered with the KHV population from the 1000 genomes project (S2 Fig). We calculated the normalized Derived Intra-allelic Nucleotide Diversity test (DIND), which is used to detect ongoing selective sweeps within a population, on each SNP found in the KHV population of the 1,000 Genomes Project and obtained ranked P values for this test [38][39][40]. This test is based on the rationale that a derived allele (i.e. the allele specific to the human lineage) under positive selection present at high frequency in the population should display lower levels of nucleotide diversity at linked sites than expected. We then extracted the ranked P value for each "best" eQTL and reQTL of our study. We compared our results with genome-wide expectations by resampling and found a significant enrichment of DIND ranked P values below 0.05 among cis-eQTL and a similar trend for reQTL (Resampling P values = 0.025 and 0.168 for an enrichment of 1.7-fold and 1.85-fold, respectively). These results are consistent with two recent studies using a similar approach [21,22].
We also calculated the global F ST, which is a measure of population differentiation between all the populations of the 1000 Genomes Project, for each cis-eQTL and reQTL. High F ST values (those closer to 1) are indicative of positive selection. We found 17 eQTL (including two reQTL) displaying an F ST value above the 95 th percentile of the F ST distribution against the minor allele frequency. Overall, considering both DIND and F ST , we found 35 cis-eQTL (including 7 reQTL for the DUSP16, CABLES2, ODF2L, UBA7, GFM1, CORO1C, CEP192 genes) targeted by recent positive selection (S7 Table). Finally, we searched for cis-eQTL exhibiting both a signal of recent positive selection and a GWAS P value < 0.05. Interestingly, we found three reQTL controlling the expression of the GFM1, CORO1C and CEP192 genes that also displayed significant DIND values (S7 Table). While the reQTL found for CEP192 exhibits a P value for association below 5% in both GWAS, CORO1C exhibits such a P value for the leprosy GWAS only, and GFM1 for the T1R GWAS only, suggesting these reQTL as targeted by distinct selective pressures and as important players during recent human evolution (for a summary flowchart of the results, see S3 Fig).

Discussion
In this study, we demonstrated the impact of cis-eQTL and in particular of reQTL during M. leprae antigens stimulation of human blood. While reQTL have been found for other infections [21][22][23][24][25][26], the newly described reQTL directly reflect the interaction of the genetic background of the human host with M. leprae antigens. The variants composing reQTL represent promising candidates for mediating resistance and/or susceptibility to M. leprae infection. We note, however, that the reQTL for 66 genes found in our study may not be exclusive to M. leprae infection. It is possible that the genes tagged are more broadly involved in host responses to infectious pathogens. For example, when comparing our results with reQTL observed by Barreiro et al. before or after M. tuberculosis infection of dendritic cells, we found an overlap of one reQTL for FOXJ2 [23]. This reQTL had been found after Salmonella typhimurium but not Listeria monocytogenes infection of macrophages, and after Influenza A virus infection of monocytes suggesting this reQTL as being cell type-and stimulus-dependent. In general terms, reQTL are exquisite indicators for host gene-pathogen interactions. Compared to eQTL, reQTL are expected to display an increased pathogen-specificity, which is consistent with the small overlap of reQTL detected here and in M. tuberculosis infected dendritic cells [23]. The latter interpretation is in line with the results of genetic studies for leprosy and tuberculosis susceptibility that detected a limited overlap in the genetic control of both diseases [12]. These findings imply that any evolutionary constraints exerted by TB, a disease with high childhood mortality, on the genetic control of gene expression levels may not be reflected in the human immune response to M. leprae.
Genes that alter their expression levels following stimulation with M. leprae sonicate are part of the host response against M. leprae bacilli. Consequently, cis-eQTL and reQTL for those genes are candidates for being genetic modifiers of human vulnerability against M. leprae and thus possible targets of natural selection. A large number of significantly differentially expressed genes were detected in our study. However, given our sample size (51 samples) and the high dose of M. leprae sonicate used for stimulation (equivalent to an estimated MOI of 50:1), we expected to detect a large proportion of differentially expressed genes in whole blood samples. Additional power calculations confirmed that power was nearly 100% to detect log 2 | FC| larger than 0.2, the threshold employed in our study being 0.5 (S5 Fig). We employed an evolutionary genetics approach to reveal which cis-eQTL and reQTL were targeted by positive selection. Previous studies had suggested that eQTL contributing to inter-individual variation in immune responses to pathogens or immune stimuli were enriched among SNPs targeted by recent selection [21,22]. A possible limitation of our approach was that we focused on the "best" eQTL per gene, which represents a high degree of stringency. Indeed, the "best" eQTL is not more likely than other SNPs of the same locus to be the one targeted by natural selection and/or the causal one. Even if the LD is strong between the SNPs of an eQTL, neutrality tests can vary between those SNPs. Thus, it is likely that we underestimated the number of eQTL targeted by positive selection, and the deduced enrichment represents a conservative approximation of the true action of natural selection on eQTL. Moreover, several variants may control synergistically the expression of a gene and taking only the "best" variant for enrichment analyses may be restrictive. Despite these limitations, we identified an enrichment of cis-eQTL under positive selection in the Kinh in Ho Chi Minh City, Vietnam, population. It is not known if this enrichment reflects the selective action of leprosy or is the result of evolutionary forces acting on host responses shared with other pathogens.
Since reQTL are likely to have a direct role in the response against M. leprae we were particularly interested in signatures of selection for such SNPs. Among the seven genes with reQTL targeted by recent positive selection (DUSP16, CABLES2, ODF2L, UBA7, GFM1, CORO1C, CEP192), only UBA7 displays a clear role in the immune system. Overall, UBA7 is downregulated after stimulation (FC = -0.34). However, the selected alleles are associated with high expression of UBA7 in stimulated cells (P values for the genotype-gene expression association: 0.05 and 1.2x10 -6 in non-stimulated cells and in stimulated cells, respectively). A decrease in expression of UBA7 might result in the diminution of the conjugation with ISG15, which has been shown to be important to fulfil its antiviral activity [41][42][43][44]. More importantly, ISG15 has been shown to be a key effector molecule in anti-mycobacterial immunity, a role that is strongly supported by our present finding [45]. The downregulation of UBA7 after stimulation with M. leprae sonicate might reflect a means of the pathogen to circumvent human immunity while the reQTL possibly illustrates how host evolution is counteracting pathogen manipulation. To what extent changes in UBA7 expression levels impact ISG15-dependent intracellular accumulation of the IFN-α/β regulator USP18 is unknown [46].
In previous studies, a link between positively selected variants and disease resistance has been shown and an enrichment for signals of recent positive selection has been found among SNPs associated with immunity-related phenotypes such as immune-mediated and infectious diseases [38,40,47,48]. While natural selection in humans is acting against infectious diseases and in favour of strong protective immune responses, excessive immune reactivity may cause immune-mediated host tissue pathology [47]. In leprosy, protective and host pathological immunity are temporally separated since nerve damage due to T1R usually occurs after the onset of M. leprae infection and clinical disease. We have previously shown distinct genetic control of T1R and leprosy [17,20]. By dovetailing the results of the eQTL analysis with the genetic control of leprosy and T1R we were able to assign genetic controllers of M. leprae host responses to either the protective (mycobacterial clearance) or the host damaging arm of the immune response (excessive inflammatory reaction characteristic for T1R). Identifying cis-eQTL and reQTL that displayed association with only one of these two phenotypes reinforced the differences in their aetiology and revealed that distinct modifiers of gene expression act on the protective and the host damaging part of the immune response. Interestingly, the integration of the functional genomics approach and GWAS data with the evolutionary genetics approach allowed us to identify reQTL for three genes, GFM1, CORO1C and CEP192, that likely have a major role against infection with M. leprae and/or T1R. Among these three genes, GFM1 is of particular interest. It has been suggested that M. leprae represses mitochondrial genes in Schwann cells [49]. It is plausible that this repression is triggered by a deregulation of GFM1 by M. leprae which might disturb translation of mitochondrial RNAs. Of note, among the GO term enrichments for upregulated genes after M. leprae antigen stimulation, we found the "release of cytochrome c from mitochondria" GO term (FDR of 3x10 -3 ) However, the specific role of GFM1 in leprosy pathogenesis needs to be determined by functional studies. Given their implication by three distinct lines of experimental evidence, the role of these three genes in leprosy pathogenesis deserves more careful scrutiny.

Ethics statement
The study was conducted according to the principles expressed in the declaration of Helsinki. Written informed consent was obtained for all adult subjects participating in the study. All minors assented to the study, and a parent or guardian provided written informed consent on their behalf. This study was approved by the regulatory authorities and ethics committees in Ho Chi Minh City, Vietnam (No. 4933/UBND-VX), and the Research Ethics Board at the RI-MUHC in Montreal QC, Canada (REC98-041).

Subjects
A total of 51 unrelated individuals from Vietnam diagnosed with borderline leprosy were recruited, including 41 males and 10 females, with a mean age of 27 years old (range 9 to 41 years old) [32]. A total of 40 of these samples had been part of the study by Orlova et al. [32]. We performed a principal component (PC) analysis to check for population structure.

Whole blood assay and RNA extraction
A total of 20mL of whole blood were obtained from each subject by venipuncture in EDTA vacutainers. A 10mL aliquot was stimulated with M. leprae sonicate at a concentration of 20ug/mL, the other 10mL remained untreated for 26-32 hours at 37˚C, 5% CO 2 as described by Orlova et al. [32]. Stimulations of whole blood samples were done in triplicate wells for stimulated cells and unstimulated controls. For RNA extraction, each set of wells was combined in a single batch. We decided for a relatively long stimulation time since faster host responses did not reveal pathogen-specificity of their genetic control [50]. Total RNA was then extracted employing a modified protocol of the LeukoLOCK RNA extraction kit (Ambion, CA, USA) and cleaned with the RNAeasy kit (Qiagen, Germany) as described by Orlova et al. [32]. All 102 samples passed quality control by BioAnalyzer (Agilent) and showed RNA Integrity Numbers above 8.5, indicating good RNA quality.

Gene expression data analysis
The 102 RNA samples were labelled using the Illumina TotalPrep RNA Amplification Kit from Ambion and hybridized to Illumina HumanHT-12 v4 Expression BeadChips and screened for 47,323 probes. All samples were randomly assigned to chips for hybridization. Gene expression levels were determined by a single microarray experiment for each sample.
Using the lumi package available in R [51], raw data were subjected to variance-stabilization transformation (VST) and quantile normalization. Only probes in autosomes, uniquely mapped and expressed above background noise (P value < 0.05) in at least 3 individuals were kept. Probes that did not match to any unique Ensembl gene and Hugo ID were excluded. For genes with several probes, to prevent spurious signals, we kept the median expression value. It is worth noting that we decided not to remove probes mapping to regions with SNPs in the KHV population following the recommendations by Schurmann et al. [52]. Supporting a weak effect of SNPs in probes, only 16 eQTL-associated genes were targeted for one probe overlapping a SNP position. As we cannot rule out the possibility that these 16 eQTL associations are spurious, they were flagged in the S2 Table. This led us to the analysis of 12,043 genes. The gene expression data were then analyzed with a linear model with a fixed effect for the treatment, and integrating age, sex, and the duration of stimulation as covariates.

Identifying differentially expressed genes and gene ontology enrichment analysis
With R, we used a linear model to perform a t test to capture the effects of the stimulation with M. leprae sonicate and calculated the log 2 (FC) for each gene. Genes were considered as differentially expressed after Bonferroni correction. The t test has been shown to be as powerful as the moderated t test for sample sizes higher than 15 samples [53]. In accordance, the comparison of our method (t test and Bonferroni correction) with the more standard limma method (moderated t test and Benjamini-Hochberg correction [54]) identified 1858/1860 (99.9%) of the same genes differentially expressed with a log 2 |FC| > 0.5. There was no difference in differentially expressed genes when a paired sample analysis was applied considering differentially expressed genes at |FC| > 0.5 (adjusted P values < 2.8x10 -6 ). Power analysis was performed using the package sizepower implemented in R [55]. A |FC| of 0.5 has been used throughout the manuscript unless stated otherwise. We used GOrilla to test for enrichment of biological processes, molecular functions and cellular components among differentially expressed genes after stimulation considering upregulated genes and downregulated genes separately, against the 12,043 tested genes for association [56]. P values were obtained using a hypergeometric model and the false discovery rate (FDR) was controlled by the Benjamini-Hochberg method [57].
Genotype-phenotype association analysis DNA from the blood donors was extracted using the Nucleon BACC 2 kit (GE HealthCare) and genotyping was performed using Illumina's Human660W-Quad BeadChip array. After standard filtration methods (no missing data, Hardy-Weinberg P value < 10 −4 ), we kept 459,703 SNPs (S1 Dataset). Non-genotyped SNPs were imputed using SHAPEIT and IMPUTE2 and the 1000 genomes Phase I v3 dataset containing 1092 individuals as the reference panel, leading to 3,888,963 variants with an imputation information ! 0.5 [58,59]. Each gene expression value and genotypes at variants located within a 200kb window centered on the gene's transcription start site were tested for association. A total of 1,722,978 variants were thus tested for association. The FDR was estimated by permuting 10 times the phenotypes (expression data), and comparing these 10 distributions to the observed one.
To increase power to detect cis-eQTL as previously described by Barreiro et al., we determined the PCs of the stimulated and non-stimulated expression data and regressed out the first 8 and 7 PCs from the stimulated and non-stimulated data, respectively [23]. The numbers of PCs to regress out were chosen because they led to the identification of the highest number of cis-eQTL (gain of more than 108% and 96% for stimulated and non-stimulated data, respectively), while conserving most of the originally found cis-eQTL (>86% and >88% of the original cis-eQTL set were conserved after PC removal for the stimulated and non-stimulated data, respectively). We found 318 cis-eQTL (composed by 13,825 variants) at an FDR of 0.01 with the 200kb window and 546 (22,449 variants) at an FDR of 0.05 (S2 and S3 Tables). For more relaxed FDR, the reader can access the http://immunpop.com/manry/eQTL website. To identify whether some eQTL were specific to one condition or the two conditions, we used the Storey and Tibshirani q-value approach implemented in the qvalue package available in R to estimate π 0 , the proportion of variants that are truly not eQTL in one condition while eQTL in the other [31]. Concerning the identification of reQTL, a variant was defined as such by using a two-step FDR cut-off as described by Barreiro et al. [23]. Briefly, a cis-eQTL was determined as an reQTL if this cis-eQTL was present at an FDR of 0.01 in one condition and no signal of cis-eQTL was found in the other condition (FDR > 0.5) among all SNPs tested in the entire 200kb region. An alternative approach to identify reQTL is to treat the fold change gene expression after M. leprae stimulation as a quantitative trait and map it. Although this approach has the advantage of avoiding arbitrarily selected cut-off points used in our approach, it fails to inform if the genes associated with reQTL show any significant differences in gene expression levels in infected or noninfected cells derived from individual of different genotypes classes. Importantly, it was shown that both approaches provide similar lists of reQTL [23].

Resampling analysis for testing the enrichment of cis-eQTL and reQTL in GWAS hits
To account for LD between cis-eQTL, we considered only the best eQTL per gene (with the lowest P value). We tested whether the leprosy per se and T1R GWAS hits (P < 0.05) were enriched for cis-eQTL by resampling. We randomly selected 318 variants in each GWAS 10 6 times and each of the 10 6 simulated dataset was analyzed and compared with the actual results. The same procedure was performed for the 66 reQTL. In addition to the enrichment analysis of (r)eQTL among GWAS hits, which was measured without taking the frequencies of the tested SNPs into account and only using the best signal for each gene, we also used the GAR-FIELD software [35]. We performed a more stringent analysis integrating the LD structure between eQTL (or reQTL), their frequencies, and their distance to the nearest TSS.
Resampling analysis for testing the enrichment of reQTL located in transcription factor binding sites We extracted the data from HaploReg v4.0 for each of the 1,123 reQTL we identified (all significant association, i.e. not the best signal per gene only), and for all the variants that were tested for being eQTL in our study (i.e. the variants located in the 200kb interval around each gene). We identified which variants were located in a transcription factor binding site (TFBS). We performed a resampling analysis to measure whether reQTL were preferentially located in TFBS. We proceeded as follows: (i) we randomly selected 1,123 variants among the 1,722,978 tested for being cis-eQTL and marked those that were located in a TFBS; (ii) for each transcription factor, we counted the number of variants among the 1,123 randomly selected, that were located in the corresponding binding site. We then repeated steps (i) and (ii) 200,000 times. To complement this approach, we used the SNP2TFBS software, which uses PWM (position weight matrices) and evaluated the effect of SNPs on TFBS (http://ccg.vital-it.ch/ snp2tfbs) [33]. This tool also performs an enrichment analysis where the enrichment P values are calculated using a binomial distribution of size n = 256 (256 SNPs which matched with at least one TFBS), and probability p which corresponds to the number of SNPs matching a given TF (genome-wide) divided by total number of SNPs matching a TFBS.

Natural selection analysis
To detect signatures of positive selection acting on the detected cis-eQTL we performed the Derived Intra-allelic Nucleotide Diversity (DIND) on populations from the 1,000 Genomes Project Phase III as previously described [39]. Briefly, we extracted sequence data from the 1,000 Genomes Project Phase III from the 26 populations, including sequence data from the KHV population (Kinh in Ho Chi Minh City, Vietnam) which is the same population as the subjects of our study [37]. Phased sequences were obtained from the MaCH website (Center for Statistical Genetics, University of Michigan) [60]. DIND is a haplotype-based test and is population and SNP-specific [61]. It was designed to highlight variants that are targeted by natural selection. The DIND test has been shown to be robust to demography and virtually insensitive to low depth of coverage generated by next generation sequencing data [39]. The DIND test is based on the iπ A /iπ D ratio, where iπ A and iπ D are the levels of nucleotide diversity associated with the haplotypes carrying the ancestral and the derived alleles, respectively. This test was performed on 100kb windows centered on each identified cis-eQTL and normalized [39]. SNPs with derived allele frequencies (DAF) < 20% in the KHV population were excluded from the analysis because of the lack of power to detect signatures of positive selection for the DIND test. P values were then obtained by genome-wide ranking. To access the significance of the DIND test, the same procedure described above for the resampling of GWAS hits was performed but on the entire set of variants found in the 1000 genomes project Phase III, with DAF above 20% with increments of 5% frequency per bin.
We also measured the population differentiation levels with the global F ST taking into account the 26 populations included in the 1,000 Genomes Project Phase III. Since F ST depends exclusively on the frequency of each variant in each population, we stratified the F ST values using sliding windows of frequencies of 0.05 with an interval of 0.005 and drew the 95 th and 99 th percentiles of genome-wide distributions. F ST values above the 95 th percentile were defined as significant.
Supporting information S1 Dataset. Genotyping data. DNA from the 51 blood donors were genotyped using Illumina's Human660W-Quad BeadChip array. After standard filtration methods (no missing data, Hardy-Weinberg P value < 10 −4 ), 459,703 SNPs were kept for analysis. 3 files are provided in the S1_Datasep.zip archive: Leprosy_eQTL_51_samples.bed, Leprosy_eQTL_51_samples.bim and Leprosy_eQTL_51_samples.fam. These files were generated using PLINK 1.9 (www.cog-genomics.org/plink/1.9/). (ZIP) S1 Fig. Quantile-quantile plot. QQ-plot of P values obtained when testing for an association between gene expression estimates and all SNPs located in a 200-kb window centered on each gene's transcription starting site (TSS) (y axis) compared with P values obtained by permuting the gene expression measurement (x axis) in non-stimulated cells with 7 PCs removed (pink), and in stimulated cells with 8 PCs removed (blue). An FDR of 0.01 corresponds to observed P values < 3.06x10 -6 in the non-stimulated condition and to P values < 1.90x10 -6 in the stimulated condition. The left panel represents the PCA of the samples obtained from raw expression data, the right panel from adjusted expression data. Expression data were adjusted by keeping only the effects of the stimulation and the residuals from the following multiple regression: Expression~Stimulation + Duration of stimulation + Stimulation Ã Duration of stimulation + Dose + Age + Gender + residuals. The x axis corresponds to the first PC, the y axis to the second PC, labels correspond to the proportion of variance retained by the corresponding PC. Of note, only differences in dose had a noticeable impact, which was corrected successfully by PC adjustment. (TIF) S5 Fig. Power to detect differentially expressed genes. On the left graph, power to detect differentially expressed genes against fold change is displayed. On the right graph, power to detect a log 2 (FC) as a function of sample size is plotted. (TIF) S1