Identification of NUDT15 gene variants in Amazonian Amerindians and admixed individuals from northern Brazil

Introduction The nudix hydrolase 15 (NUDT15) gene acts in the metabolism of thiopurine, by catabolizing its active metabolite thioguanosine triphosphate into its inactivated form, thioguanosine monophosphate. The frequency of alternative NUDT15 alleles, in particular those that cause a drastic loss of gene function, varies widely among geographically distinct populations. In the general population of northern Brazilian, high toxicity rates (65%) have been recorded in patients treated with the standard protocol for acute lymphoblastic leukemia, which involves thiopurine-based drugs. The present study characterized the molecular profile of the coding region of the NUDT15 gene in two groups, non-admixed Amerindians and admixed individuals from the Amazon region of northern Brazil. Methods The entire NUDT15 gene was sequenced in 64 Amerindians from 12 Amazonian groups and 82 admixed individuals from northern Brazil. The DNA was extracted using phenol-chloroform. The exome libraries were prepared using the Nextera Rapid Capture Exome (Illumina) and SureSelect Human All Exon V6 (Agilent) kits. The allelic variants were annotated in the ViVa® (Viewer of Variants) software. Results Four NUDT15 variants were identified: rs374594155, rs1272632214, rs147390019, andrs116855232. The variants rs1272632214 and rs116855232 were in complete linkage disequilibrium, and were assigned to the NUDT15*2 genotype. These variants had high frequencies in both our study populations in comparison with other populations catalogued in the 1000 Genomes database. We also identified the NUDT15*4 haplotype in our study populations, at frequencies similar to those reported in other populations from around the world. Conclusion Our findings indicate that Amerindian and admixed populations from northern Brazil have high frequencies of the NUDT15 haplotypes that alter the metabolism profile of thiopurines.


Introduction
The analysis of the polymorphisms of the Drug Absorption, Distribution, Metabolism, and Excretion (ADME) genes has been commonly employed in personalized medicine as a diagnostic tool for the selection of the most appropriate drugs and/or dosage for the treatment of a range of diseases. A number of recent studies [1,2,3] have identified varying frequencies in many genetic polymorphisms in genetically distinct populations, which may modulate the prognosis of patients and the therapeutic efficacy of treatments.
One ADME-related gene that represents a major breakthrough in the field of pharmacogenetics is the nudix hydrolase 15 (NUDT15) gene. This gene acts in the metabolism of thiopurine, catabolizing its active metabolite thioguanosine triphosphate (TGTP) into its inactivated form, thioguanosine monophosphate (TGMP) [4]. Studies have shown that genetic variants responsible for the loss of NUDT15 function lead to an increased incorporation of thioguanine nucleotides into the DNA strand, which would result in an exacerbation of the associated cytotoxic effects, culminating in the appearance of hematological toxicities, such as severe myelosuppression [5,6].
Thiopurine-based drugs, such as 6-mercaptopurine (6-MP) and azathioprine (AZA), are widely used to treat Acute Lymphoblastic Leukemia (ALL) and Inflammatory Bowel Disease (IBD), respectively. However, the efficacy or tolerance of these drugs may vary considerably according to the level of activity of the NUDT15 enzyme in the patient [7,8]. The American Food and Drug Administration (FDA) recently recommended specific dose adjustments of both 6-MP and AZA according to the NUDT15 genotype profile of the patient [9,10].
Data available from international databases (the 1000 Genomes Project, the Exome Aggregation Consortium [ExAC], and the Genome Aggregation Database [gnomAD]) show that the frequencies of alternative NUDT15 alleles, especially those that cause a drastic loss of gene function, vary greatly between geographically distinct populations. A C>T transition in the rs116855232 (p.Arg139Cys) variant is assigned to two haplotypes (NUDT15 � 2 and NUDT15 � 3), which present an extreme loss of gene function, associated with the presence of the T allele of this polymorphism. The T allele of rs116855232 is found more commonly in East Asian [13], Latin [14] and Native American [15] populations, but is rare in populations from Europe, Africa, and South Asia. The NUDT15 � 9 haplotype is derived from an in-frame deletion in the sequence responsible for the catalytic activity of NUDT15 (rs746071566), and is associated with extreme loss of enzyme function. This deletion has been found exclusively in patients of European or African descent [16].

PLOS ONE
An in-frame insertion in this same region (rs1272632214; p.Gly17_Val18dup) is assigned to two haplotypes (NUDT15 � 2 and NUDT15 � 6), which are both related to a drastic loss of gene function. This insertion has been identified in populations from Guatemala, Singapore, and Japan, and enzymatic assays have shown that this insertion is responsible for a reduction in the catalytic activity of the NUDT15 enzyme [11].
In Brazil, thiopurine-based therapy is part of the standard treatment protocol for ALL. In a recent study developed by our research group on a northern Brazilian population, severe toxicities (grades 3 and 4) were found in 65% of the patients treated with the standard therapeutic protocol for ALL [7]. This toxicity rate is much higher than that observed in other populations (worldwide average = 26%) treated using the same protocol [7,17].
Previous data demonstrated that Brazilian Native American populations have a high frequency of a polymorphism (rs116855232) that drastically reduces NUDT15 activity, and that these ethnic group contribute 30%, on average, of the genetic makeup of the admixed populations of the Brazilian Amazon region [15,18,19]. Thereafter, a possible cause of the severe toxic effects observed in northern Brazil during the standard thiopurine-based treatment protocol is that individuals of this region have higher frequencies of NUDT15 deleterious alleles, resulting from the genomic contribution inherited of Amerindian groups. Our objective is to evaluate the frequency of deleterious alleles of NUDT15 gene in Amazonian Amerindians and admixed populations of northern Brazil.
Therefore, we employed next-generation sequencing (NGs) data obtained by our research group to define the molecular profile of the NUDT15 gene in two samples, one containing 64 individuals from 11 different indigenous groups of the Brazilian Amazon region and the other, 82 individuals representing the admixed population from the same region.

Study population
The study population is composed of 64 Amerindians and 82 admixed individuals from the Amazon region of northern Brazil. The Amerindians represent 12 different Amazonian ethnic groups that were grouped together as the Native American (NAM) group. Details such as name, location and number of individuals in each ethnic group are described in S1 Table. The 82 admixed individuals live in Belém, city located in northern Brazil, and, due to its colonization process, are characterized of, mainly, three ancestral genetic components: European, Native American and African. These admixed individuals were enrolled in a broader project to investigate germline mutations in patients with gastric cancer, and are referred to in the present study as the Brazilian Admixed Population (BAP). The present study was approved by the National Committee for Ethics in Research (CONEP) and the Research Ethics Committee of the UFPA Tropical Medicine Center, under CAAE number 20654313.6.0000.5172. All participants signed a free-informed consent as well as the tribe leaders and when necessary a translator explained the project and the importance. The recruitment period for participants was from September 2017 to December 2018.
We compared our results with those of populations from other continents obtained from the phase 3 release of the 1000 Genomes database (available at http://www.1000genomes.org). These population include those of African (AFR), Latin (AMR), East Asian (EAS), European (EUR), and South Asian (SAS) descent. We also compared our findings with genomic data from two databases of genomic variants analyzed in Brazilian populations, specifically individuals from southeast Brazil: the Online Archive of Brazilian Mutations (ABraOM) (available at http://abraom.ib.usp.br) and the Brazilian Initiative on Precision Medicine (BIPMed) (available at https://bipmed.org/).

Extraction of the DNA and preparation of the exome library
The DNA was extracted using the phenol-chloroform method described by Sambrook et al. [20]. The genetic material was quantified using a Nanodrop-8000 spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, DE, USA) and the integrity of the DNA was assessed by electrophoresis in 2% agarose gel.
The libraries were prepared using the Nextera Rapid Capture Exome (Illumina) and Sure-Select Human All Exon V6 (Agilent) kits following the manufacturer's recommendations. The sequencing reactions were run in the NextSeq 500 1 platform (Illumina 1 , US) using the Next-Seq 500 High-output v2 300 cycle kit (Illumina 1 ).

Statistical analyses
The allele frequencies of the populations were obtained directly by gene counting, and compared with the other study populations (AFR, EUR, AMR, EAS, SAS, ABraOM and BIPMed). The difference in frequencies between the populations were analyzed by Fisher's exact test, results were considered significant when p-value � 0.05. The inter-population variability of the polymorphisms was assessed using Wright's fixation index (FST). We performed a linkage disequilibrium estimation using Haploview v. 4.2. The analyses were run in Arlequin v.3.5 [21] and in RStudio v.3.5.1.

Results
We identified four NUDT15 variants, two INDELs (Insertion/Deletion) and two SNVs (Single Nucleotide Variant) in the 146 individuals analyzed, with a medium coverage of 77X. Table 1 provides details of the characteristics of these variants, including their chromosomal location and genic region, the reference number, ClinVar significance, the in silico prediction of functional impact, and pathogenicity.
The rs374594155 variant corresponds to a deletion of 25 nucleotide pairs located in intron 1 of the NUDT15 gene. No published information is available on the possible phenotypic or functional effects of this mutation.
Two variants (rs147390019 and rs116855232) provoke changes in the amino acid sequence. Both have clinical significance for the response of the organism to thiopurine-based drugs, as indicated by the ClinVar. In silico predictions of pathogenicity have also been reported. The rs1272632214 variant is derived from the insertion of six nucleotide pairs into exon 1, which results in the in-frame insertion of two amino acids (Glycine and Valine) into the catalytic activity region of the enzyme.
The Hardy-Weinberg Equilibrium (HWE) was calculated and p-values of less than 0.05 were considered significant. All the polymorphisms reported were in HWE, except for rs374594155 (p-value < 0.001).
The allele frequencies recorded for the four variants in the two study populations are relatively high, in general, in comparison with the five populations obtained from the 1000 Genome database and the samples of Brazilian populations from the ABraOM and BIPMed databases, as shown in Table 2.
The in-frame insertion of six nucleotide pairs (rs1272632214) has an allele frequency of 9.4% in the Amerindian population analyzed here, and a frequency of 6.8% in the admixed population. These frequencies are slightly higher than those recorded in the AMR (5%) and EAS (6%) populations, and much higher than those observed in the EUR (1%), ABraOM (0,1%) and BIPMed (0,5%) populations. The in-frame mutation is absent from the AFR and SAS populations. The C>T transition that defines the rs116855232 was identified only in the individuals that presented the above-mentioned in-frame polymorphism, both in the Amerindian population (9.4%) and in the Amazonian admixed population (6.8%). In both populations, the rs1272632214 and rs116855232 are in complete linkage disequilibrium (D = 1 and R 2 = 1). This variant (T allele) is more frequent in Asian (EAS = 9.5% and SAS = 7%) and Latin (AMR = 4.5%) populations, but is very rare in Africans (AFR = 0.1%), Europeans (EUR = 0.2%) and in southeast Brazilian (ABraOM = 1,2% and absent in BIPMed).
The G>A transversion that defines the rs147390019 was identified in two Amerindians (frequency of 1.6%) and one individual from the admixed population (0.6%). This variant is considered rare (frequency less than 1%) in all other populations from around the world.
The Table 3 shows the pairwise comparisons of frequencies found between the Amerindians and the admixed Brazilian population of this study, the five populations of the 1000 genomes project and the sample of Brazilians from the ABraOM database.
The results demonstrated that the rs116855232 variant has the most discrepant frequencies between the Amerindians and admixed Brazilian populations in comparison with the others; each showed significant differences with Africans, Europeans and Brazilians from ABraOM. Indeed, when we analyze the pairwise differences by the prism of parental population, the European is the most distinctive regarding both Amerindians and admixed Brazilian population from Amazon ( Table 3).
The Table 4 reports the inter-populational variability of the three exonic variants found in the Amazonian Amerindian and the admixed Brazilian population, which was assessed by the Wright's fixation index (FST). The most significant differences were reported between the NAM and AFR groups (0.47507), followed by the NAM and EUR groups (0. 39153) and, the NAM e ABraOM populations (0. 29119).

Discussion
The present study is the first to investigate the complete sequence of the NUDT15 gene in Amazonian Amerindians and a highly admixed population with a major Amerindian component, groups that are under-represented, in general, in pharmacogenetic studies. We described four variants of the NUDT15 gene, one of which (rs374594155) is intronic, while the other three (rs147390019, rs1272632214, and rs116855232) are all exonic. These variants all have considerable allelic frequencies and well-established clinical impacts in terms of the tolerance of thiopurine-based treatment protocols and the development of severe toxicity. Two of the variants (rs1272632214, and rs116855232) are in complete linkage disequilibrium.
Based on the Pharmacogene Variation Consortium (PharmVar) classification, only two variant haplotypes were observed in our samples [21]. One (NUDT15 � 4) is represented by the isolated variant rs147390019. The other (NUDT15 � 2) is defined by the combined presence of rs1272632214, and rs116855232.
More than 20 allelic variants described in the NUDT15 gene are capable of reducing or even deactivating the enzymatic activity of the protein and provoking cytotoxic effects [22,23,24]. However, most of these variants are rare and unique to a specific population or group.
The international population databases show that the frequency of the mutations that reduce the functional activity of the enzyme, such as those described by the PharmVar Consortium, exceeds 1% only in Asians, Latin American and Amerindian populations [14,15,25]. The most frequent mutations of these groups include the rs116855232 variant, either in isolation or combined with rs1272632214.
The rs147390019 and rs1272632214 polymorphisms were first associated with hematopoietic toxicity in a cohort of children diagnosed with ALL from Japan, Guatemala, and Singapore [11]. In this study, the functional nucleotide phosphatase activity of the NUDT15 enzyme was also described. The evidence indicates that the presence of rs147390019 and rs1272632214 may result in the loss of 75% and 85% in vitro enzyme function, respectively. The study [11] also confirmed the absence of the enzyme catabolizing function when the rs1272632214 and rs116855232 variants were combined. Overall, then, the three variants identified here modify substantially the enzymatic function of NUDT15 and can have a significant impact on the response of the carriers of this genotype to thiopurine-based treatments.
Our analysis showed that 22% of the Amerindians investigated and 15% of the admixed individuals have haplotypes with known clinical impacts. The NUDT15 � 2 haplotype is more frequent in both populations (NAM = 9.4%, BAP = 6.8%) than NUDT15 � 4 (NAM = 1.6%, BAP = 0.7%). There is one specific investigation that evaluated the presence of rs116855232 of the NUDT15 gene in Amerindian populations from southern and midwestern Brazil [15]. In this study, the mean frequency of rs116855232 was 25%, ranging from 5% to 32%, generally higher than the frequency recorded in the present study (9.4%). The differences observed among different Amerindian ethnicities may be due to the stochastic effects of genetic drift, which are common in Brazilian Amerindian groups [26,27,28]. In fact, we observed that the frequencies of variants in each of the Amerindians tribes, in which the number of individuals is equal or above five, have great variability (S2 Table). For example, the NUDT15 � 2 has a frequency of 3% in the Asurini de Trocará group whereas in the Zo'é population is 30%; thus, this reinforce the existence of allelic fluctuations in different Amerindians tribes, wich are arising from evolutionary processes, particularly genetic drift.
In the case of the admixed population, we compared our data with those available in two public genomic databases from the state of São Paulo, in southeastern Brazil: BIPMed, which comprises 106 individuals; and ABraOM, comprising 609 individuals. In the BIPMed database, it was reported 11 polymorphisms in the NUDT15 gene, however most of them had a frequency lower than 1%. In regard with the variants reported in this study, only one was also described in the database, the rs1272632214, with a very low frequency (only one out of 212 alleles) (http://bipmed.iqm.unicamp.br).
In summary, the maximum proportion of individuals with deleterious alleles in the population from southeastern Brazil is much lower than the one we recorded in the admixed population from northern Brazil. One possible reason for this discrepancy is the greater contribution of Amerindian ancestry to the northern (Amazonian) population in comparison with that from southeastern Brazil, since the first has an average of 30% and the second has a medium of 8.8% of Amerindian genetic background [17,18,30].
The Amerindian ancestry has also been linked to higher frequencies of NUDT15 deleterious alleles, specifically the rs1272632214 and rs116855232 polymorphisms, in other populations, such as the subpopulations from the Latin population of 1000 genomes database (S2 Table). The Mexicans (MXL), Colombians (CML), Puerto Ricans (PUR), and Peruvians (PEL) and have around of 25%, 25%, 13% [31], and 77% [32] of Amerindian genetic background; and the rs1272632214 and rs116855232 frequencies observed in these populations are, respectively, 2.1% and 2.1% for CML, 3.1% and 4.7% for MXL, 10.6% and 11.8% for PEL and, lastly, 0.5% and 0.5% for PUR, which can be inferred that the populations with higher Amerindian genomic ancestry have higher frequencies of the deleterious alleles, i.e Peruvians, followed by Mexicans.

Conclusions
Our results demonstrate that Amerindian and admixed populations from northern Brazil have a high frequency of two NUDT15 haplotypes ( � 2 and � 4) that alter significantly the thiopurine metabolization profile. Our findings indicate that these deleterious mutations could partly account for the high rates of toxicity found in ALL patients form northern Brazil. It will nevertheless be essential to evaluate the practical effects of these haplotypes in case-control studies with patients undergoing 6-mercaptopurine therapy, in order to establish their potential association with the poor clinical outcomes.