No evidence for association between APOL1 kidney disease risk alleles and Human African Trypanosomiasis in two Ugandan populations

Background Human African trypanosomiasis (HAT) manifests as an acute form caused by Trypanosoma brucei rhodesiense (Tbr) and a chronic form caused by Trypanosoma brucei gambiense (Tbg). Previous studies have suggested a host genetic role in infection outcomes, particularly for APOL1. We have undertaken candidate gene association studies (CGAS) in a Ugandan Tbr and a Tbg HAT endemic area, to determine whether polymorphisms in IL10, IL8, IL4, HLAG, TNFA, TNX4LB, IL6, IFNG, MIF, APOL1, HLAA, IL1B, IL4R, IL12B, IL12R, HP, HPR, and CFH have a role in HAT. Methodology and results We included 238 and 202 participants from the Busoga Tbr and Northwest Uganda Tbg endemic areas respectively. Single Nucleotide Polymorphism (SNP) genotype data were analysed in the CGAS. The study was powered to find odds ratios > 2 but association testing of the SNPs with HAT yielded no positive associations i.e. none significant after correction for multiple testing. However there was strong evidence for no association with Tbr HAT and APOL1 G2 of the size previously reported in the Kabermaido district of Uganda. Conclusions/Significance A recent study in the Soroti and Kaberamaido focus in Central Uganda found that the APOL1 G2 allele was strongly associated with protection against Tbr HAT (odds ratio = 0.2, 95% CI: 0.07 to 0.48, p = 0.0001). However, in our study no effect of G2 on Tbr HAT was found, despite being well powered to find a similar sized effect (OR = 0.9281, 95% CI: 0.482 to 1.788, p = 0.8035). It is possible that the G2 allele is protective from Tbr in the Soroti/Kabermaido focus but not in the Iganga district of Busoga, which differ in ethnicity and infection history. Mechanisms underlying HAT infection outcome and virulence are complex and might differ between populations, and likely involve several host, parasite or even environmental factors.

found, despite being well powered to find a similar sized effect (OR = 0.9281, 95% CI: 0.482 to 1.788, p = 0.8035). It is possible that the G2 allele is protective from Tbr in the Soroti/ Kabermaido focus but not in the Iganga district of Busoga, which differ in ethnicity and infection history. Mechanisms underlying HAT infection outcome and virulence are complex and might differ between populations, and likely involve several host, parasite or even environmental factors.

Author summary
Human African Trypanosomiasis (HAT) occurs in two distinct disease forms; the acute form and the chronic form which are caused by microscopically indistinguishable hemoparasites, Trypanosoma brucei rhodesiense and Trypanosoma brucei gambiense respectively. Uganda is the only country where both forms of the disease are found, though in geographically distinct areas. Recent studies have shown that host genetic factors play a role in HAT resistance and/or susceptibility, particularly by genes involved in the immune response. In this study, we identified single nucleotide polymorphisms in selected genes involved in immune responses and carried out a case-control candidate gene association study in Ugandan participants from the two endemic areas. We were unable to detect any polymorphisms that were robustly associated with either Tbr or Tbg HAT. However, our findings differ from recent studies carried out in the Tbr HAT another endemic area of Uganda that showed the APOL1 (Apolipoprotein 1) G2 allele to be protective against the disease which merits further investigation. Larger studies such as genome wide association studies (GWAS) by the TrypanoGEN network that has >3000 cases and controls covering seven countries (Cameroon, Cote d'Ivoire, DRC, Malawi, Uganda, Zambia) using the H3Africa customized chip reflective of African genetic diversity will present novel association targets (https://commonfund.nih.gov/globalhealth/h3aresources).

Introduction
The tsetse transmitted African trypanosomes are flagellated protozoa, a range of which cause disease in animals (known as Nagana) and humans (Human African Trypanosomiasis, HAT, also known as sleeping sickness). These diseases are responsible for significant morbidity and mortality [1][2][3] and therefore directly impact on public health and animal productivity. Current reports indicate that annual HAT incidence is on the decline, although under reporting is typical, especially in areas where conflicts and civil unrest interrupt control efforts and regular epidemiological surveys [4][5][6].
HAT is caused by two microscopically indistinguishable sub-species: Trypanosoma brucei rhodesiense that causes an acute form of the diseases that develops within a few weeks or months of infection, and Trypanosoma brucei gambiense that causes a chronic form of the disease that can take years to become patent. The acute form of the disease is prevalent in Eastern and Southern Africa while the chronic form of the disease is prevalent in West and Central Africa [4]. Uganda is the only country with active foci for both forms of the disease, though in geographically distinct regions.
Studies in the Democratic Republic of Congo (DRC), Cameroon, Cote D'Ivoire, Guinea and Uganda have found evidence for polymorphisms in HP, IL6 and APOL1 associated with outcome of infection [7][8][9][10][11][12]. In the present study, we investigated the possible association of selected gene polymorphisms with HAT by undertaking a candidate gene association study (CGAS) using case-control samples from the Tbr and Tbg HAT endemic areas of Uganda. The  IL10, IL8, IL4, HLAG, TNFA, TNX4LB, IL6, IFNG, MIF, APOL1, HLAA, IL1B, IL4R, IL12B,  IL12R, HP, HPR, and CFH genes that were selected have protein products that are involved in the HAT immune response. The CGAS approach was used to compare the frequencies of genetic polymorphisms between cases and controls in order to identify risk variants for HAT in the two Ugandan populations.

Ethics statement
This study was approved by the Uganda National Council of Science (UNCST; assigned code HS 1344) following review by the IRB of the Ministry of Health. Participants were identified through community engagement and active field surveys; they gave written informed consent administered in their local language by trained local health workers. In instances where participants were below 18 years of age, consent was sought from a parent or primary guardian. Any individuals for whom it was not possible to obtain consent or blood samples were excluded from the study.

Study population
The Tbr HAT endemic area samples were from the traditional Tbr HAT foci in the South East of Uganda [13]. Samples were collected mainly from Iganga district and included individuals from the predominantly Basoga ethnic group, with a few Baganda, Banyole, Balamogi, Basiginyi, Itesot, and Japadhola ethnicities.
The Tbg HAT endemic area samples were from the traditional Tbg HAT foci in the Northwest of Uganda [13]. Samples were collected from Adjumani, Arua, Koboko, Maracha, and Moyo districts and comprised of individuals from the Kakwa, Lubgbara and Madi ethnicities. In both areas, only individuals who were born and lived in these traditional foci were selected, as they were most likely exposed to HAT for most of their lives.
HAT cases were defined as individuals in whom trypanosomes have been detected by microscopy in at least one of the body tissues including, blood, lymph node aspirates or cerebral spinal fluids. Controls were defined as individuals from the endemic area with no history or any signs/symptoms suggestive of HAT. Controls from the Tbg HAT endemic area were required to have no serological reaction to the CATT or Trypanolysis tests.
Blood was drawn by venipuncture and collected in EDTA/heparin vacutainer tubes (BD). Buffy coats were prepared from the whole blood in field laboratories using centrifugation, aliquoted, and then stored in liquid nitrogen in preparation for DNA extraction that was carried out at the Molecular Biology Laboratory, COVAB, Makerere University. The DNA was quantified using Qubit (Life Technologies).

Study design
This study was one of five studies of populations of HAT endemic areas in Cameroon, Cote d'Ivoire, Guinea, Malawi and Uganda. The studies were designed to have 80% power to detect odds ratios (OR) >2 for loci with disease allele frequencies of 0.15-0.65 and 100 cases and 100 controls with the 96 SNPs genotyped. The study design included an overall total of 462 samples, 239 samples from Tbr HAT endemic regions (120 cases, 119 controls) and 223 samples from Tbg HAT endemic regions (110 cases and 113 controls).
Power calculations were undertaken using the pbsize routine in Genetics Analysis Package gap version 1.1-16 in R [14].

SNP selection
96 SNPs were selected for genotyping using two strategies: 1) SNPs that had been previously reported to be associated with HAT or 2) in the cases of IL4, IL8, IL6, HLAG and IFNG by using sets of SNPs in LD (r 2 >0.5) with each other, such that all bases in the gene were in LD with at least one SNP. The SNPs in this second group of genes were selected using a merged SNP dataset obtained from 10X coverage whole genome sequence data generated from 230 residents living in regions (DRC, Guinea Conakry, Ivory Coast and Uganda) where trypanosomiasis is endemic (TrypanoGEN consortium, sequences at European Nucleotide Archive Study: EGAS00001002602) and 1000 Genomes Project data from African populations. Linkage (r 2 ) between loci was estimated using the PLINK v1.9 package (https://www.cog-genomics. org/plink/1.9/) [25] and sets of SNPs that covered the gene were identified. Some SNP loci were excluded during assay development or failed to genotype and were not replaced.

Genotyping
Approximately 1μg of gDNA per sample were submitted to INRA (Plateforme Genome Transcriptome de Bordeaux, France) for genotyping. A multiplex analysis (two sets of 80 SNPs each) was designed using Assay Design Suite v2.0 (Agena Biosciences). SNP genotyping was achieved with the iPLEX Gold genotyping kit (Agena Biosciences) for the MassArray iPLEX genotyping assay, following the manufacturer's instructions. Products were detected on a MassArray mass spectrophotometer and the data acquired in real time with MassArray RT software (Agena Biosciences). SNP clustering and validation was carried out with Typer 4.0 software (Agena Biosciences). SNPs that failed genotyping at INRA and some additional SNPs were genotyped at LGC Genomics, Hoddesden, UK where SNPs were genotyped using the PCR based KASP assay [26]. A summary of the candidate genes and SNPs is shown in S1 Table.

Statistical analysis
The raw genotypic data were converted to PLINK format and quality control (QC) procedures implemented using the PLINK v1.9 package [25]. PLINK was used to determine the level of individual and genotype missingness, Hardy-Weinberg Equilibrium (HWE), estimate allele frequencies, and linkage disequilibrium (LD). Testing for population stratification and admixture was carried out using Admixture 1.3 [27] and the plot was visualized using StructurePlot2 [28].
Testing for the association of SNPs with HAT was done using a Fisher's exact test [29] implemented in PLINK and producing a 95% confidence interval for the odds ratios. Controlling for multiple testing was implemented using a Bonferroni correction (α Ã = α/n, where α is the level of significance, n is the number of independent SNP association tests and α Ã is the adjusted threshold of significance) [30]. The Bonferroni correction assumes that each of the statistical tests are independent; however, this was not always true since there was some linkage disequilibrium between the SNPs in IL4, IL8, IL6, HLAG and IFNG which were subject to complete linkage scans. Where the assumption of independence is not true, the correction is too strict potentially leading to false negatives. Thus, an alternative correction for multiple testing was also employed. The Benjamini-Hochberg false discovery rate (FDR) estimates the proportion of significant results (p < 0.05) that are false positives [30,31].

Results
Our study population consisted of 239 individuals from Tbr and 223 from the Tbg HAT endemic areas. The former comprised of 120 cases and 119 controls, who had a mean age of 43 ± 5 years, and a male to female ratio of 1:2. The Tbg HAT endemic area participants comprised of 110 cases and 113 controls, who had a mean age of 37 ± 5 years, and a male to female ratio of 1:1.

Genotyping and data quality control
Uganda is the only country where both acute and chronic HAT are endemic [32]. The two forms of the disease however occur in geographically isolated regions [32]. The two samples represented two distinct forms of the disease and regions inhabited by distinct ethnic groups (Nilo-Saharan language speakers in the Tbg region and Bantu language speakers in the Tbr region). The cohorts were analyzed separately including initial quality control. Ninety-six (96) SNPs in 15 genes were genotyped from each of the Tbr and Tbg HAT endemic area samples as shown in S1 Table (the Plink MAP and PED files are available in S1 and S2 Data). Before association testing, histograms of the distribution of missing data both by individual and by locus (Supplementary Figures S1 Fig-S4 Fig) were inspected in order to identify appropriate cut-offs to apply in each population. Individuals with missing data or loci with missing data above the cut-off threshold were removed as were loci that were not in HWE, or those that were poorly genotyped [33,34].
Individuals with more than 20% or 15% missing data were excluded from the Tbr and the Tbg HAT endemic datasets, respectively, resulting in a final dataset of 238 (119 cases and 119 controls, 1:2 male to female sex ratio) individuals from the Tbr HAT endemic sample and 202 (99 cases and 103 controls, 1:1 male to female sex ratio) individuals from the Tbg HAT endemic sample (Supplementary Figures S1 Fig and S2 Fig). Similarly, loci that were missing more than 30% or 40% data were excluded from the Tbr and the Tbg HAT endemic area samples (Supplementary Figures S3 Fig and S4 Fig). We used a HWE p-value cut-off of 1 x 10 −8 and further selection of loci below the HWE cut off was done basing on their genotype scatter plots to see which loci were to be excluded. In order to get a high LD between marker and causal SNPs, loci that were in a five SNP window after a single step with a variance inflation factor (VIF) [VIF = 1/(1-R 2 )] beyond 0.2 were excluded from both sample datasets. This was done because a high LD between marker SNPs increases redundancy and reduces power. After quality pruning, 79 loci from Tbr and 85 loci from the Tbg HAT endemic samples were included in the association testing.

Admixture for population structure
Admixture was used to test for population structure that might confound the association study. Eight values of K ancestral populations from 1-8 were tested to identify which had the lowest coefficient of variations (CV) error. CV error was at a minimum for K = 4, but the CV error was very similar for all values of K (0.42-0.46) providing no persuasive evidence for any particular number of ancestral populations. The Admixture plot showed no clear evidence for any gross population structure and therefore no correction for population structure was applied in the analysis.

Association testing yielded no robust associations
Six SNPs in the Tbr HAT endemic area and four in the Tbg endemic had raw p < 0.05 but none of these remained significant after Bonferroni correction (Table 1). Surprisingly, there was no evidence for association with any SNP in APOL1.

Discussion
In this case-control CGAS, we found no evidence for variants associated with Tbr or Tbg HAT in two Ugandan populations. We tested for association between candidate genes and the disease caused by Tbg and Tbr separately as they present two distinct forms of the disease. Tbr and Tbg parasite resistance to human serum is mediated by different mechanisms which place distinct selective pressures on the host genes [35]. Furthermore, the two populations were from different broad ethnolinguistic groups, and were geographically isolated from each other [13]. Admixture analysis found no evidence of population structure with these SNP which might have reduced the power of the study (S5 Fig). We found no SNP associated with HAT after multiple testing corrections. Our power calculations indicated that we had power to detect odds ratios > 2, however 7 of the 10 SNPs with P <0.05 had odds ratios < 2.0, which the study was not powered to detect. Larger populations would be required to confirm these findings and the data presented could be used to estimate the necessary sample size.
The most striking feature of the data was the absence of any association at APOL1. The APOL1 G2 (deleted allele for indel rs71785313) allele has been shown to be lytic to T. b. rhodesiense in vitro [36] and a recent study in the Soroti and Kaberamaido focus in Eastern Uganda found an association with APOL1 G2 and protection from Tbr HAT with an odds ratio of 0.2 [8]. The present study in the Busoga focus was well powered to discover such a strong effect,  Table). The frequency of the G2 allele in the control population in Kabermaido (14.4%) [8] was higher than in Busoga (8.6%). Although this difference in G2 allele frequency is not significant with the sample sizes that were used (Chisq Test p = 0.12), it may be indicative of real differences between these populations in selection pressure on this allele. Despite their geographical proximity (240km) these populations have divergent ethnic backgrounds; with the Kaberamaido population being Luo speakers which is a Nilotic family language originating in Sudan and Ethiopia and the Busoga population being Niger-Congo-B (Bantu) language speakers with origins in West Africa. These linguistic groups are believed to have diverged over 5,000 years ago giving plenty of time for allele frequencies to diverge. Therefore, despite the well-established function of APOL1 in response to trypanosome infection and the evidence for protection associated with G2 in Kaberamaido [8], the role of APOL1 G2 in response to T. b. rhodesiense infection more generally remains to be clarified. Despite the suggestively significant associations found at nine SNP loci, none of them passed Bonferroni correction for multiple testing [30]. The FDR_BH shows the rate of type 1 errors or false positives, eg for rs9380142 in HLA-G there is an 18% chance that this is a false positive and conversely a 82% chance that it is a true positive. There was a greater than 38% probability for each of these nine SNPs being associated with HAT [30,31]. The finding of suggestive associations in multiple populations would increase the probability that these are genuine associations with disease [37]. For example, our findings suggest that HLA-G variants may be important in both forms of the disease. These observations should be followed up in future studies.
Supporting information S1 Table. Candidate genes included in the study.