Several Different Lactase Persistence Associated Alleles and High Diversity of the Lactase Gene in the Admixed Brazilian Population

Adult-type hypolactasia is a common phenotype caused by the lactase enzyme deficiency. The −13910 C>T polymorphism, located 14 Kb upstream of the lactase gene (LCT) in the MCM6 gene was associated with lactase persistence (LP) in Europeans. This polymorphism is rare in Africa but several other variants associated with lactase persistence were observed in Africans. The aims of this study were to identify polymorphisms in the MCM6 region associated with the lactase persistence phenotype and to determine the distribution of LCT gene haplotypes in 981 individuals from North, Northeast and South Brazil. These polymorphisms were genotyped by PCR based methods and sequencing. The −13779*C,−13910*T, −13937*A, −14010*C, −14011*T LP alleles previously described in the MCM6 gene region that acts as an enhancer for the LCT gene were identified in Brazilians. The most common LP allele was −13910*T. Its frequency was highly correlated with European ancestry in the Brazilian populations investigated. The −13910*T was higher (0.295) in southern Brazilians of European ancestry and lower (0.175) in the Northern admixed population. LCT haplotypes were derived from the 10 LCT SNPs genotyped. Overall twenty six haplotypes previously described were identified in the four Brazilian populations studied. The Multidimensional Scaling analysis showed that Belém, in the north, was closer to Amerindians. Northeastern and southern Afro-descendants were more related with Bantu-speaking South Africans whereas the Southern population with European ancestry grouped with Southern and Northern Europeans. This study shows a high variability considering the number of LCT haplotypes observed. Due to the highly admixed nature of the Brazilian populations, the diagnosis of hypolactasia in Brazil, based only in the investigation of the −13910*T allele is an oversimplification.


Introduction
Adult-type hypolactasia or lactose intolerance (OMIM #223100) is a worldwide common phenotype determined by lactase deficiency, it is due to lactase activity decline after weaning. Lactase or lactase-phlorizin hydrolase enzyme (EC 3.2.1.23-62) is encoded by the LCT gene and it is located in the brush border membrane of small-intestinal enterocytes. The lactase enzyme activity is to hydrolyze lactose, the main carbohydrate in milk [1]. Most intolerant subjects present symptoms like bloating, flatulence, nausea, and diarrhea after consumption of fresh milk [2][3][4]. Moreover, adult-onset lactase decline appears to be a risk factor for osteoporosis due to avoidance of dairy products or undigested lactose interference with calcium absorption [5].
The regulation of LCT expression in humans has been studied extensively. No causative differences in the LCT gene sequence have been found within the gene. However, a T/C polymorphism at position 213910 and an A/G polymorphism at position 222018 from the start codon of the LCT gene have been identified. Although these nucleotide variants are located in introns 9 and 13 of the neighboring MCM6 gene, the 213910*C allele associates 100% and the 222018*G allele associates approximately 97% with the lactase nonpersistent phenotype [6][7][8]. The region surrounding the 213910 position has been described to function as an enhancer stimulating the LCT promoter activity. The derived allele 213910T increases promoter activity [8][9][10][11].
The LCT gene (OMIM #603202) was mapped on 2q21 [12]. Several single nucleotide polymorphisms (SNPs) were described across the lactase gene, and these polymorphic sites were used to derive LCT haplotypes [13,14]. The two SNPs associated with lactase persistence (LP) phenotype are linked to an A-haplotype background in European populations [6]. Mulcare et al. [15] showed that the 213910*T allele cannot be causal of lactase persistence in most Africans, although it could possibly explain lactase persistence in some Cameroonians. In that study, it was suggested that the presence of the 213910*T allele in Cameroon is due to introgression from outside sub-Saharan Africa.
In nomadic pastoralist and non-pastoralists groups from East and South Africa and Middle East populations, other polymorphisms at the same enhancer region or on its vicinity were also related to the LP phenotype. For example, 213907C.G (rs41525747) and 213915T.G (rs41380347) were both identified in Ethiopia, Kenya, Saudi Arabia, Sudan, and Tanzania populations, whereas the 213915T.G was also found in Ethiopian Somali, Morocco, and Jordan [16][17][18][19], whereas the 214010G.C (rs145946881) polymorphism was described in Kenya, Tanzania and Xhosa-speaking South Africans [17,20]. Functional studies demonstrated the role of the 213910*T, 213907*G, 213915*G, and 214010*C alleles in the maintenance of the enzyme expression during adulthood [9][10][11].
The Brazilian populations were formed by successive migratory waves. Amerindian people occupied the Brazilian territory when the Portuguese arrived in 1500 and colonized the country. Then between the 16 th and 19 th centuries, West and Southwest Africans were brought to Brazil as slaves. In addition to the Portuguese, other migratory waves occurred in the 19 th and 20 th centuries, mainly from Italy, Germany and Spain [21]. All of these migratory events contributed to the formation of a multi-ethnic and highly admixed population. This heterogeneity was documented in several genetic studies that used uniparental or autosomal markers to demonstrate a typical, although non-uniform, tri-ethnic (European, African and Amerindian) pattern for the Brazilian population. This admixture process occurred in different ways in the various geographic regions of the country. In Northeastern Brazil, the African contribution is high; in the North, the contribution of Native Americans is pronounced; and in the South, there are reduced Amerindian and African influences when compared with the other geographic regions [22,23].
The aims of this study were (1) to determine the prevalence of LP related alleles; and (2) to describe the distributions patterns of the LCT haplotypes in the Brazilian population.

Identification of SNPs in the LCT Enhancer Region
The overall 213910*T allele frequency varied from 17.5% in the Northern admixed population to 29.5% in Southern Brazilians of European ancestry ( Table 1). The 213910*T allele frequency is higher in the Southern Euro-descendants (p = 1.7610 25 ) than in the other Brazilian populations investigated. As LP is a dominant trait [24], the LP predicted phenotype frequency based on 213910C.T genotypes was inferred ( Table 1). The majority of the population from Belém, Recife, and the Afro-descendants from Porto Alegre are lactose intolerant (CC genotype with almost 70% frequency). In the Euro-descendants individuals from Porto Alegre, the lactase persistence frequency is higher than 50%.
Comparisons of the 213910*T allele frequencies of the four Brazilian populations with frequencies available in dsSNP-NCBI database showed that overall these frequencies differ from those described, although some similarities could also be observed. Subjects with European ancestry from Porto Alegre are similar to those described at the global frequency from the NIH Polymorphism Discovery Resource (PDR90) only. The Afro-descendants from Porto Alegre, Recife, and Belém populations did not differ from African Americans from Southwest USA (HapMap-ASW). The Recife population also did not differ from the frequency described at the 1000Genomes project. These results are shown in the Table S1.
The 213910C.T and 222018G.A polymorphisms are in high linkage disequilibrium (data not shown). The combined frequencies of these two alleles are presented in Table 2. Table 3 shows other variants found by sequencing of the LCT enhancer region in the Brazilian population. Four different polymorphisms were observed: 213937*A (rs4988234), 214010*C (rs145946881), 214011*T (rs4988233) and 213779*C (not included in the dsSNP). Their regional distribu-tion is also shown in Table 3. A total of 9 heterozygous individuals for each of these alleles were detected. The Northeastern population was the more variable with 3 different alleles besides the 213910*T whereas Euro-descendants subjects from the South presented only the 213910*T allele.

LCT Haplotypes
A total of 26 haplotypes were observed in the Brazilian population. The most variable population was Recife in the northeast that presented 21 haplotypes. The most frequent haplotype in all populations was the A. The haplotypes observed and their frequencies are shown in Table 4. The F ST values were calculated for haplotype frequencies of the four Brazilian and their parental populations (Amerindians, Bantu-speaking population from Africa, and Southern and Northern Europeans). The degree of differentiation among populations is not high. The highest F ST value is between the Euro-descendants from Porto Alegre and the Bantu-speaking population from Africa (0.202, p,0.0001, Table 5).
The nonmetric Multidimensional Scaling (MDS) analysis ( Figure 1) showed the relationships among Brazilians with their parental populations based on D A genetic distance. Belém, in the north, was closer to Amerindians. Recife and southern Afrodescendants were more related with Bantu-speaking South Africans whereas the Porto Alegre population with European ancestry grouped with Southern and Northern Europeans. The stress of this model is 0.047.
The 213910*T allele was observed in other haplotypes than the A in the Brazilian population. The association of the 213910*T and LCT haplotypes in the four Brazilian populations are shown in the Tables S2, S3, S4, and S5.

Discussion
The 213910*T allele is present in the three Brazilian regions studied. The highest frequency of this allele was found in Eurodescendants southern subjects. The Afro-descendants from Porto Alegre also have a high frequency of this allele (18.4%) probably due to the high proportion of European contribution (43.1%) [23]. The 213910*T LP European allele was also present in Northern and Northeastern Brazil, both populations with high contributions of European ancestry to their gene pool (69.7% and 60.6%, respectively) [23].
The 213910C.T and 222018G.A polymorphisms are in high linkage disequilibrium in the four Brazilian populations studied, as in Northern Europeans and in the African Fulbe population [6,25]. The G R A mutation might have occurred only shortly before the C R T mutation and it has been suggested that there was not enough time for recombination to break the TA haplotype [25]. Nevertheless we found the 213910*T allele in combination with the 222018*G allele in southern Brazilians with European ancestry and in the admixed population from the northeast (Tables S2, and S5). This combination would be possible under three situations: if a de novo mutation generating a 213910*T allele occurred in a chromosome that carried a 222018*G allele; or if a recombination event occurred; this second situation being more plausible in a highly admixed population as Brazilians. Moreover the fact that the TG combination is present in four different haplotypes (Tables S2, and S5) reinforces the recombination hypothesis. Another possibility for this finding is the contribution of Amerindian genomic ancestry. The TG haplotype was also observed in this ethnic group [26].
The CA combination was found on an A-haplotype background (Tables S2, S3, S4, S5). This CA haplotype is considered rare but it was also reported in the population from London [27], Portugal, São Tomé Island [25], and Kaingang [26]. It is interesting that the São Tomé Island has a colonization history similar to Brazil, where Portuguese settlers imported slaves from the Gulf of Guinea and from Congo and Angola region [28]. Maybe the CA combination is a genomic vestige of the Portuguese settlers that colonized both São Tomé Island and Brazil.
The 213910*T allele has been first reported to occur exclusively on a LCT A haplotype background [27]. More recently, it has been shown that the 213910*T occurs on two divergent A subhaplotypes suggesting more than one origin for the lactase persistence allele in Europeans [29]. Ingram et al [19] described this allele on an F haplotype in this ethnic group. In Brazil the 213910*T allele was found on A, B, J, and K haplotypes backgrounds (Tables S2, S3, S4, S5). Probably recombination was the source of this diversity in this tri-ethnic population.
This study reports the presence of other substitutions in the LCT enhancer region than the 213910C.T in the Brazilian population. These variants had been previously described in Africans.
The 214010*C allele observed in one individual with African ancestry from Porto Alegre is common in East and South Africa. This allele is considered a LP allele because it has been demonstrated in vitro that it increases gene transcription [30]. It occurs in 32% and 39% of Kenyans and Tanzanians respectively [18]. It has also been reported in the Somali [19]. In Black Xhosa-speaking South Africans the 214010*C allele was observed in 13.3% of the individuals investigated [20]. This allele was also observed in low frequencies (1-6%) in Angola, Southwest Africa [31]. Two explanations for these observations in different parts of Africa were hypothesized: 1) a direct migratory link between East and Southwest Africa; 2) a first contact between East African and South pastoralists, followed by 214010*C allele transfer to Southwest pastoralists [31].
In Brazil, the 214010*C allele occurred in a heterozygous individual for A and P haplotypes. This allele was first described on an F haplotype [17] and after on a B-haplotype background [19]. In this last report the 2958C.T (rs56064699) polymorphism that discriminates between B and P haplotypes was not tested; therefore the B haplotype of Ingram et al. [19] might be the same P haplotype observed herein.
Less is known about the other variants detected. The 213779*C allele was described in a lactose non-digester individual (frequency 1/107) from a Somali cohort [19]. But it was common in an Indian herder sample (0.024) [32]. Now this allele was detected in two admixed subjects from Recife. The 213937*A allele was observed in one individual of African ancestry from Porto Alegre and in one individual from Recife. This allele was described at low frequencies (0.014) in the Black Xhosa-speaking people from South Africa, a population that has the habit of consuming fermented milk [20]. Functional studies about the role of these two variants in the transcription of the LCT have not been performed yet.
The 214011C.T was described in the Estonian and Indian population [32,33] and its global frequency is 0.006. The functional role of this variant in the LCT transcription is unknown. But its location is a good predictor of functionality since it is close to the 214010G.C that interacts with transcription factors that increase lactase promoter activity [30]. This variant was observed in four admixed Brazilian subjects, three from the Northeast and one from the North.
This diversity at the LCT enhancer region is not unexpected if we consider the Brazilian roots. The slaves brought to Brazil were mainly from West and Southwest African areas and highly admixed with the European colonizers [23,34].
Although the high admixture rate in Brazil determine a high number of LCT haplotypes, the A haplotype was the most frequent. The A haplotype prevalence is explained by the large contribution of European ancestry to the Brazilian population gene pool: 69.7% in the North, 60.6% in the Northeast, and 94% in the South [23,34].
The MDS analysis shows the close relation of parental populations to Brazilians. Southern Europeans are closer to Porto Alegre Euro-descendants. The Porto Alegre Afro-descendants and Recife population are more related to the Bantu-speaking South  Africans, whereas Brazilian Amerindians have a closer relation with the Belém population. Two previous studies validated the screening of the 213910C.T polymorphism for hypolactasia molecular diagnosis in Brazil [35,36]. A third study concluded that the 222018G.A polymorphism is a better predictor of lactase persistence in Japanese-Brazilians than the 213910C.T [37]. In our study we demonstrated that a more comprehensive screening would be needed since we found four variants in the enhancer region besides the 213910C.T. If only the 213910C.T polymorphism would be tested in Recife, for example, 6 individuals would be considered lactose intolerant and they are carrier of LCT enhancer region variants that could be causal of the lactase persistence phenotype. In heterogeneous populations like Brazilians a single test for the

Ethics Statement
All enrolled subjects provided their written informed consent to participate. The study protocol was approved by the ethics committees of the Federal University of Rio Grande do Sul, Federal University of Pará and of the Instituto Materno Infantil de Pernambuco, the three institutions that participated in blood sample collection.

Subjects
The study cohort consisted of 981 individuals recruited from the North, Northeast, and South regions of Brazil ( Figure 2). All individuals from the Southern sample were selected at random at the Clinical Analysis Laboratory of the Pharmacy School of the Federal University of Rio Grande do Sul, Porto Alegre, among those who came from several city health centers for free routine blood determinations. This sample included 337 individuals of European ancestry and 182 African Brazilians. European and African ancestry were ascertained by visual inspection of skin color and morphological characteristics. These samples have been fully described in previous publications [38][39][40]. The Northeastern sample consisted of 262 healthy adolescents ascertained at Instituto Materno-Infantil de Pernambuco at Recife, the capital of the Brazilian Northeastern state of Pernambuco. The characteristics of this sample were previously described [41]. The Northern sample consisted of 200 individuals from the city of Belém in the Brazilian Amazon region ascertained at the Federal University of Pará. We did not stratify by ethnicity the Northeastern and Northern populations because skin color is not very indicative of genomic ancestry in these populations [23,34]. The LP status of the individuals sampled was not evaluated.

Identification of SNPs in the LCT Enhancer Region
The 213910C.T (rs4988235) polymorphism was genotyped by PCR-RFLP as previously described [35]. In an attempt to identify other polymorphisms in the 214 kb region, a fragment of 427 bp was amplified with the MCM6i13 and LAC-CL2 primers [16] in all non-carriers of the 213910*T allele. PCR products were purified with Exonuclease I and Shrimp Alkaline Phosphatase enzymes and then sequenced at MACROGEN (Seoul, Republic of Korea) using the MCM6i13 primer. The 222018G.A (rs182549) polymorphism was genotyped by allelic discrimination using TaqMan assays in a real time PCR equipment (StepOne Plus, Applied Biosystems).

LCT Haplotypes
From the eleven polymorphisms described across the 70 Kb LCT gene [42], ten were genotyped in the present study to infer LCT haplotypes. Seven out of 10 polymorphisms investigated mapped between 21099 and 2502 at the 59 flanking region and were identified by sequencing a fragment of 597 pb at MACRO-GEN as previously described [26]. The other three polymorphisms 666G.A (rs3754689), 5579T.C (rs2278544), and 6236TG.DD (rs10552864) reside within the LCT gene, and were genotyped by TaqMan assays. The haplotypes were designated according to the nomenclature previously described [14].

Statistical Analysis
Chromatograms were examined using CodonCode Aligner Software v.3.5.7. Allele frequencies were obtained directly by gene counting. Hardy-Weinberg equilibrium was tested by Chi-Square with the WINPEPI software [43]. This software was also used for allele frequency comparisons by heterogeneity Chi-Square test. LCT haplotypes were inferred using a Bayesian algorithm implemented in Phase v.2.1 software [44,45]. Wrigh's F ST [46,47] was calculated using Arlequin v.3.0 [48]. D A distance [49] matrix for the LCT haplotypes frequencies was generated with POPTREE software version 1. This matrix was used in the nonmetric Multidimensional Scaling analysis performed with SPSS v.18.

Supporting Information
Table S1 P-values from the Chi-square test of the 213910*T allele frequencies among the studied populations from Brazil and the data available in dsSNP-NCBI. (DOC)