Genetic Variation in TLR Genes in Ugandan and South African Populations and Comparison with HapMap Data

Genetic epidemiological studies of complex diseases often rely on data from the International HapMap Consortium for identification of single nucleotide polymorphisms (SNPs), particularly those that tag haplotypes. However, little is known about the relevance of the African populations used to collect HapMap data for study populations conducted elsewhere in Africa. Toll-like receptor (TLR) genes play a key role in susceptibility to various infectious diseases, including tuberculosis. We conducted full-exon sequencing in samples obtained from Uganda (n = 48) and South Africa (n = 48), in four genes in the TLR pathway: TLR2, TLR4, TLR6, and TIRAP. We identified one novel TIRAP SNP (with minor allele frequency [MAF] 3.2%) and a novel TLR6 SNP (MAF 8%) in the Ugandan population, and a TLR6 SNP that is unique to the South African population (MAF 14%). These SNPs were also not present in the 1000 Genomes data. Genotype and haplotype frequencies and linkage disequilibrium patterns in Uganda and South Africa were similar to African populations in the HapMap datasets. Multidimensional scaling analysis of polymorphisms in all four genes suggested broad overlap of all of the examined African populations. Based on these data, we propose that there is enough similarity among African populations represented in the HapMap database to justify initial SNP selection for genetic epidemiological studies in Uganda and South Africa. We also discovered three novel polymorphisms that appear to be population-specific and would only be detected by sequencing efforts.


Introduction
Human genetic studies of diseases with complex inheritance involve analysis of single nucleotide polymorphisms (SNPs) which are present at a range of population-specific frequencies. The proper selection of SNPs for genetic studies requires either discovering the SNPs in the population of interest with de novo sequencing efforts or relying on information from similar populations in public databases. The International HapMap Consortium [1,2] has provided a database of common SNPs in a number of diverse global populations including three from Africa: the Yoruba from Nigeria (YRI), and the Luhya (LWK) and Maasai (MKK) from Kenya. The HapMap project has been instrumental for selection of SNPs for study in a variety of complex diseases in diverse populations. However, the applicability of the data from the African populations described in HapMap to studies in other parts of Africa is less obvious because of the immense genetic diversity on the African continent. Haplotype blocks are shorter in Africans [3], and haplotype and linkage disequilibrium (LD) diversity is abundant [3][4][5]. Thus, studies of genetic variation in other African populations are valuable in understanding how to plan genetic epidemiologic studies in these diverse populations.
Toll-like receptors (TLRs) play a key role in the innate immune response to a variety of pathogens. Mutations and polymorphisms in TLR genes have been associated with susceptibility to various infectious diseases, including Mendelian disorders with mutations in IRAK4, MyD88, TLR3, and Unc93b [6]. In studies of diseases with complex inheritance patterns, TLR polymorphisms have been associated with susceptibility to several infections, including tuberculosis (TB) [7] and leprosy [8]. In addition, previous studies have shown that the TLR1/6/10 region is under natural selection [9][10][11] and that TLR1 has a high degree of population differentiation [8]. Though most studies have focused on common variation in TLR genes, a sequencing study conducted in Houston identified a number of rare variants in TLR genes that were associated with TB [12]. Because of their central role in TB immunity and potential importance for vaccine development, it is of interest to study variants in TLR genes and their association with TB and other infectious diseases in Africa, where they are especially prevalent.
In this study, we conducted full exon sequencing of TLR2, TLR4, TLR6, and TIRAP in samples obtained as part of ongoing studies in Kampala, Uganda, and Cape Town, South Africa. Our objective in this study was first to examine whether there were any novel TLR gene polymorphisms in these study populations, and second, to compare the genotype and haplotype frequencies between these populations and the African HapMap populations.

Results
To understand haplotype structure and population polymorphism diversity, we sequenced the coding region of four candidate genes in two populations and compared it to four populations of African ancestry in the HapMap database. Specifically, we sequenced gene regions from the Kampala Ugandans (UG, n = 48) and Cape Town South Africans (SA, n = 48) and compared findings to four populations composed of unrelated individuals from the HapMap database: the Maasai in Kinyawa, Kenya (MKK, n = 143), the Luhya in Webuye, Kenya (LWK, n = 90), the African Ancestry in Southwest USA (ASW, n = 53), and Yoruba in Ibadan, Nigeria (YRI, n = 114). Four candidate genes were considered in our analysis: TLR2 (chromosome 4q32), TLR4 (9q32-q33), TLR6 (4p14), and TIRAP (11q23-q24). Genotype frequencies for these populations are provided in Table 1. Only those SNPs genotyped across all six populations were considered in the statistical comparisons. There were two novel polymorphisms discovered in the Ugandan population: one TIRAP SNP (G222A (A74A)) and one TLR6 SNP (A1696G (P564P)). In the South Africans, one novel polymorphism was found in TLR6 (T34A (F12I)), which we previously reported [13]. None of these SNPs were present in the 1000 Genomes database [14]. Of note, TLR2 rs5743709, present in the Ugandan population (MAF 5%) but not in the South African or African HapMap populations, was observed in the 1000 Genomes database in the Asian populations (MAF = 8.8%) and in the Hispanic populations from Puerto Rico and South America (minor allele frequency = 0.2%). We observed departures from Hardy-Weinberg proportions in the Ugandan population in one TLR6 SNP (rs3775073; p = 0.001).
Comparison of genotype frequencies across these six populations suggested that there were a range of frequencies that were often different among the groups (p,0.05 by chi-square test) ( Table 1). When comparing our sequencing results to the HapMap populations, the most significant differences were seen when comparing UG to the HapMap population, especially within the TLR6 gene. SA, as a whole, showed fewer significant differences from the HapMap populations. P-values for all pair-wise comparisons are given in Table 1. Heterozygosity values for each SNP in each population are provided in Table S1. The ratio of observed to expected heterozygosity was generally similar across populations. The notable exception to this was TLR6 in the Ugandan population, where the ratio of observed to expected heterozygosity was very low (0.216), which reflects decreased genetic diversity. Also, TLR4 shows reduced genetic diversity in both SA and UG, reflecting the presence of monomorphic SNPs in these populations.
We next examined LD patterns and haplotype structure between the different groups (Table S2). In these and subsequent analyses, the SA population was stratified into its component ethnic groups: Black, Caucasian, and South African Mixed Ethnicity. Both the UG and SA Mixed Ethnicity population, as well as the HapMap populations, showed low amounts of LD (absolute value of r 2 ,0.2 for 76.9% of comparisons). Haplotypes were constructed for the SNPs that were common between our sequencing study and HapMap, and haplotype frequencies were compared between the populations (Tables 2, 3,4,5). For this analysis, we considered only the African populations, UG, SA Mixed, MKK, LWK, and YRI, choosing not to include the American individuals with African ancestry (ASW) as our goal was to identify which African populations are ''similar'' to the UG and SA Mixed individuals. Also excluded from the haplotype analyses were the Black and Caucasian South Africans due to limited sample size. When comparing across all five populations, there were significant differences in haplotype frequencies for all genes (p,0.0001). The most unique result from these analyses was that haplotypes in TLR2 and TLR6 in the UG individuals were absent from the other five populations. Also, some rarer haplotypes (less than 10% frequency) were unique to some HapMap populations. Analyses of TLR6 haplotype frequencies were not feasible as the UG haplotypes did not exist in the other populations (Tables 2, 3 , 4, 5).
Multidimensional scaling (MDS) analysis, which combined the TLR2, TLR4, TLR6, and TIRAP data, was used to illustrate how UG and SA Mixed Ethnicity populations clustered with the HapMap populations. We plotted the first two dimensions from MDS ( Figure 1). Visual examination of this plot shows a great deal of overlap among these African populations as well as with the ASW, with a few outlying points. The SA Mixed Ethnicity population tended to cluster more with the ASW population, as well as with the MKK, LWK, and YRI. The UG population primarily clustered with the MKK, LWK, and YRI. Analysis of Euclidean distances between individuals within different population clusters showed that distances between individuals were not significantly different (all pairwise p-values .0.27); this suggests that there is overlap among all the population clusters.

Discussion
The primary finding of our study was that the genotype frequency and haplotype structure of Ugandans and South Africans of Mixed ethnicity are similar to those in the HapMap database among the Kenyan and Yoruba Nigerian reference populations. A practical issue for genetic association studies is to determine whether tag SNPs identified using HapMap data adequately capture patterns of variation in other populations [15]. Thus, it is of interest to examine both genotype frequencies and haplotype frequencies between HapMap populations and other global populations. Genetic diversity, measured by the ratio of observed to expected heterozygosity, was also generally similar across these populations. Examination of haplotype frequencies and LD patterns is also informative for identification of the appropriate population(s) for tag SNP selection [16]. Generally,     our data showed the most common haplotype in each of the four genes was the same across all the populations, though the less common haplotypes differed and there were some unique haplotypes in the Ugandan population. We observed notable haplotype frequency differences between the Ugandan population and the HapMap populations in TLR6 (discussed in depth below) and TLR2, showing that differences between African populations do exist and tag SNPs should be selected judiciously. Though there were differences in haplotype frequencies across populations, we also observed overlap among populations in our cluster analysis. This latter analysis is only exploratory, since it is based on polymorphisms common to exons and HapMap in four genes.
When the African data are examined as a whole, there is notable similarity, though there are slight differences between pairs of populations. Many studies [9,15] have suggested that genetic similarity between populations is generally predictable based on geographic location. Conrad et al. [15] concluded that HapMap is indeed a valuable resource, and geography could be used to identify the most appropriate HapMap population because haplotype similarity is greatest in nearby populations. However, that study was conducted prior to the release of HapMap Release 3 data, which   Another noteworthy finding of our study was that a small number of novel polymorphisms were detected in TLR6 and TIRAP. The relevance of rare variants in complex trait susceptibility is gaining attention [17]. Ma et al. [12] also conducted sequencing of TLR genes and observed there were more rare non-synonymous polymorphisms in African-American and Caucasian TB cases than in controls. In addition, they found that rare variants were overrepresented in the TLR1/6/10 region. Our findings support a conclusion of Ma et al. that resequencing strategies are valuable in the search for rare and populationspecific variants that may be associated with disease, particularly in populations of African descent.
The occurrence of novel polymorphisms, such as in the Ugandan population on TLR6, results in unique haplotypes not seen in other populations, which is consistent with a potential effect of selection [18]. There is additional evidence of positive selective pressure on TLR6 in the Ugandan population. One TLR6 SNP in the Ugandan population is in significant deviation from Hardy-Weinberg proportions. The existence of a unique, common polymorphism (A1696G) and significant shift in genotype frequencies (rs3775073 and rs3821985) are additional indicators [5]. There is also significantly reduced heterozygosity in TLR6 in the Ugandan population, further reflecting selective pressure [19]. The novel TLR6 SNP in the South African population (T34A) is also quite common. Previous studies have shown that the TLR1/ 6/10 region is under natural selection [9][10][11]. There is also a unique, common TLR2 haplotype in the Ugandan population, suggesting selective effects on TLR2. As suggested by Barreiro and Quintana-Murci [5], complex traits like TB are likely polygenic, so the effects of selection on individual loci are likely weaker. Actual population genetic tests examining effects of selection require full sequence data, so are beyond the scope of this paper.
There are a few limitations with this analysis. We restricted our analysis to TLR pathway genes, because of their key role in the innate immune response. Generalizations to the rest of the genome cannot be made based on only four genes, and selective pressure on immunity genes may result in different population genetic parameters than the rest of the genome. Second, our haplotype and LD analyses were restricted to SNPs that were common to both our exomic sequencing efforts and the HapMap. Furthermore, some SNPs were represented in Phase I and II of the HapMap, but not Phase III, and vice versa. Because of the differences seen in haplotype frequencies and LD, some information may have been lost by virtue of this aspect of study design. Finally, our sample size was underpowered to detect small differences between populations. We had 70-80% power to detect a difference of 0.2 of allele frequencies, but only had 20-30% power to detect differences of 0.1.
In conclusion, we found that there is more similarity across African populations than there is dissimilarity, though patterns of similarity do not necessarily reflect geographic proximity. Thus, Table 3. TLR4 Haplotype Comparison Among Populations.   HapMap provides a good starting point for genetic association studies. However, one must be mindful of possible LD differences between specific populations and those represented on the HapMap. Selective pressure by TB and other infectious diseases may have influenced differential LD structure across Africa. For this reason, we suggest using all three African HapMap populations as the reference for tag SNP selection, as has been advocated by other studies [20].

A-A-G A-A-T A-G-G A-G-T G-A-G G-G-G G-A-T
Since it is well-known that African populations show such high genetic diversity, unique polymorphisms may exist in those populations that may not be represented in the HapMap panels. Thus, follow-up sequencing of certain genes may be warranted in specific populations. Our findings also have utility for admixture mapping studies, which require data on ancestral populations [21].

Study population
Samples were obtained as part of two ongoing studies in Uganda and South Africa. Ugandan samples were initially collected as part of the Household Contact Study [22] and Kawempe Community Health Study [23], both of which enrolled individuals from urban Kampala, Uganda. For this sequencing study, we selected 48 unrelated healthy individuals who were part of a whole genome scan study [24]. Most of these individuals (87.5%) self-identified their tribe as Baganda; the remaining identified themselves as Rwandese (2 individuals), Zairean, Nubian, Langi, and Acholi. An analysis of substructure using STRUCTURE in our genome scan data showed that there was no substructure within the larger dataset [24], so we analyzed all of the Ugandan individuals together.
The South African samples were collected from healthy adults enrolled at the South African Tuberculosis Vaccine Initiative clinical site near Cape Town in South Africa [13]. Exclusion criteria included HIV or other chronic infections, pregnancy or active tuberculosis. The study population included individuals from different backgrounds, including Black African (n = 8), Caucasian (n = 7) and South African Mixed Ethnicity (n = 33). The latter is a distinct group that emerged more than 300 years   ago and received genetic influences from Malaysia, Indonesia, European Caucasoid and Black Africans [21,25].

Ethics statement
The institutional review boards at University Hospitals of Cleveland and the Uganda Council for Science and Technology approved the Ugandan study. All individuals in the Ugandan study provided written informed consent. For the South African study, all protocols for this study were approved by the Research Ethics Committee of the University of Cape Town and the Institutional Review Boards at the University of Washington and University of Medicine and Dentistry of New Jersey. Ethical guidelines of the US Department of Health and Human Services and the South African Medical Research Council were adhered to, including written informed consent from parents of study participants.

Data analysis
Hardy-Weinberg proportions were tested within healthy populations for each SNP using HWSIM (http://krunch.med.yale. edu/hwsim/) with 10,000 iterations. Genotype frequencies were compared across groups using a chi-squared test (or Fisher's exact test when appropriate) in SAS PROC FREQ. LD was assessed using Haploview software, calculating both r 2 and D'. Haplotypes were estimated using DECIPHER (S.A.G.E. version 6.0), using the most likely phase for each individual. Differences in haplotype frequencies across populations were evaluated using chi-square or Fisher's exact tests, as well as for the genotype comparison tests.
To examine how our populations clustered with the HapMap populations, we conducted multidimensional scaling (MDS) using PLINK (http://pngu.mgh.harvard.edu/purcell/plink/). MDS is similar to principal components analysis in that it utilizes SNP genotype data to estimate a matrix of allele sharing identical by state (IBS) and constructs a similarity matrix, then represents each subject by a vector of coordinates [28]. We plotted the first two dimensions to assess how the Ugandan (UG) and South African Mixed (SA Mixed) individuals clustered with the other populations, using MDS in the same way that it is used for identifying population stratification. In order to quantitatively assess the overlap between these populations, we estimated the Euclidean distance between each individual. These distances were used to construct a distribution, which was approximately normal. Then, we estimated the average distance between members of two populations (UG, SA Mixed, etc.) and assessed if the difference was statistically significant using the normal distribution.