Signatures of Natural Selection at the FTO (Fat Mass and Obesity Associated) Locus in Human Populations

Background and aims Polymorphisms in the first intron of FTO have been robustly replicated for associations with obesity. In the Sorbs, a Slavic population resident in Germany, the strongest effect on body mass index (BMI) was found for a variant in the third intron of FTO (rs17818902). Since this may indicate population specific effects of FTO variants, we initiated studies testing FTO for signatures of selection in vertebrate species and human populations. Methods First, we analyzed the coding region of 35 vertebrate FTO orthologs with Phylogenetic Analysis by Maximum Likelihood (PAML, ω = dN/dS) to screen for signatures of selection among species. Second, we investigated human population (Europeans/CEU, Yoruba/YRI, Chinese/CHB, Japanese/JPT, Sorbs) SNP data for footprints of selection using DnaSP version 4.5 and the Haplotter/PhaseII. Finally, using ConSite we compared transcription factor (TF) binding sites at sequences harbouring FTO SNPs in intron three. Results PAML analyses revealed strong conservation in coding region of FTO (ω<1). Sliding-window results from population genetic analyses provided highly significant (p<0.001) signatures for balancing selection specifically in the third intron (e.g. Tajima’s D in Sorbs = 2.77). We observed several alterations in TF binding sites, e.g. TCF3 binding site introduced by the rs17818902 minor allele. Conclusion Population genetic analysis revealed signatures of balancing selection at the FTO locus with a prominent signal in intron three, a genomic region with strong association with BMI in the Sorbs. Our data support the hypothesis that genes associated with obesity may have been under evolutionary selective pressure.


Introduction
Obesity is a complex disease with an estimated heritability of 40-70% [1,2] The existence of genetic factors is well supported by a number of polymorphisms identified in recent genomewide association studies [3]. Single nucleotide polymorphisms (SNPs) in the fat mass and obesity-associated gene (FTO) locus seem to be among the eminent factors associated with obesity measures such as body mass index (BMI). The associations between FTO variants and BMI have been robustly replicated in populations with different ethnic backgrounds [4][5][6][7]. The human FTO maps on 16q12.2 and encodes a 2-oxoglutarate-dependent nucleic acid demethylase [8]. The SNP rs9939609 representing a cluster of variants with strong associations to BMI and overweight is located in the first intron [5]. Whereas these associations have initially been shown in cohorts of European origin [5], they could not be replicated in an African sample and the Han Chinese [9,10]. On the other hand, in the Sorbs-a population of Slavic origin residing in Eastern Germany, in addition to the association signal in the first intron, the strongest association to BMI was found for the SNPs mapping to intron three [11]. These findings indicate specific effects of FTO alleles in Sorbs and raise the question, whether FTO has been subject to natural selection and so, does show population specific patterns of selection.
Considering the "thrifty genotype" hypothesis which states that an evolutionarily advantageous increased capacity to store energy may result in obesity and type 2 diabetes (T2D) in Western-lifestyle societies [12], genes associated with obesity and T2D have become attractive targets of evolutionary studies. Until recently, there has not been a strong evidence for consistent patterns of selection at loci associated with T2D which would provide conclusive confirmation of the thrifty genotype hypothesis. It has been shown more recently that in a locus-bylocus study, 14 loci associated with T2D, and to a lesser extent, obesity, from European, Africans and East Asian populations appear to have undergone selection, however there is no positive selection evidence found when all the T2D loci were analyzed together [13].
Since comprehensive data regarding FTO evolution are sparse, we initiated studies searching for signatures of selection in vertebrates and human populations. To test for the conservation of the protein-coding sequence on the inter-species level, we calculated the ratio of non-synonymous to synonymous base exchanges (ω = d N /d S ) of the coding region in 35 vertebrate FTO orthologs with Phylogenetic Analysis by Maximum Likelihood (PAML). Furthermore, we investigated SNP data for footprints of selection using DnaSP version 4.5 and the Haplotter/Pha-seII in human population of Yoruba from Ibadan, Nigeria (YRI), Utah residents with Northern and Western European ancestry (CEU), East Asians, (ASN), Han Chinese from Beijing, China (CHB), and Japanese from Tokyo, Japan (JPT)). Finally, we examined in silico the possible impact of FTO SNPs in intron 3 on transcription factor (TF) binding sites.

Materials and Methods
All studies were approved by the ethics committee of the University of Leipzig. All subjects gave written informed consent.

Phylogenetic Analyses by Maximun Likelihood (PAML)
PAML [14] provides a program CODEML to estimate the level of gene conservation by calculating the d N /d S ratio ω (d N : non-synonymous mutation substitution rate, d S : synonymous mutation substitution rate). In the present study, ω was calculated in PAML version 4.1 [15]. The coding sequences of 35 vertebrate FTO orthologs were extracted from Ensembl (http://www. ensembl.org) and the NCBI (http://www.ncbi.nlm.nih.gov) databases. Species and accession numbers are provided in S1 Table. Subsequently, all coding sequences were aligned by a widely used progressive alignment method, ClustalW [16] within MEGA 6 [17]. Phylogenetic tree was conducted by Neighbor-Joining (NJ) algorithm using aligned coding sequences in MEGA 6. The evolutionary distances were computed by Jukes-Cantor model which is the best predicted model giving by jModeltest 2.1.5 [18]. 1,000 bootstrap searches were performed to infer the phylogenetic tree and bootstrap consensus phylogenetic tree. The initial input for PAML analysis is displayed in Fig. 1.
Due to the fact that the power to detect positive selection is reduced when the rates across sites are averaged, diverse tests were adopted according to recommendations for real data analyses [19]. The tests conducted include the one-ratio model (M0), free-ratio [20], nearly neutral (M1a), positive selection (M2a) [21], discrete (M3) [22], beta (M7), and beta&ω (M8) [23]. The likelihood ratio tests (LRT = 2 (l 1 -l 0 ), 2Δl, where l 1 and l 0 are the log likelihoods from two models respectively) are conducted to every two nested models [24] in order to identify which model better fits to the data. LRTs and nested models are briefly introduced as followed: the M0 model is a plain model in which the same d N /d S ratio is assumed for all branches in the phylogeny [20]. The free-ratio model is the most general model, where an independent d N /d S ratio is assumed for every branch [20]. The first LRT involves the M0 model and the free-ratio model which can be compared to survey if the d N /d S ratios ω differ among lineages [20]. Paired M0 model and M3 models can be tested by the second LRT, which is to analyze if ω varies among sites. In the discrete model, M3, three site classes, each of them with an independently estimated ω which also allows for sites with ω > 1, are estimated over a general discrete distribution. For each site class the proportion p is given [22]. The third LRT compares the M1a model and M2a model. M1a postulates a class of sites with ω = 0 and a second class of sites with 0 < ω < 1, where in the M2a model a third class of sites is added (ω >1) [25,24]. The fourth LRT is between M7 model and M8 model, which has more power to detect positively selected sites, as both models allow for sites with 0 < ω < 1 [23]. For M7 a beta distribution for ω over sites is assumed, which is limited to the interval (0, 1) [23]. In M8 another site class is added with ω valuated from the data set which allows sites with ω > 1 [23].
Additionally, data of the HapMap populations (YRI, CEU, ASN, CHB, JPT) were downloaded from http://www.hapmap.org/ and filtered for the same SNPs genotyped in the Sorbs (S2 Table). As the CEU and YRI comprise parent/child trios, analyses were performed without SNP data of the children. Haplotype reconstruction in all populations was performed with PHASE version 2.1 [29,30].
DnaSP provides population genetic measures like Tajima's D [31], Fu and Li's DÃ, and Fu and Li's FÃ [32], which all detect deviations from the normal distribution of common or rare The bootstrap consensus tree inferred from 1000 replicates is generated to present the evolutionary relationship among 35 species on FTO coding sequences which were retrieved from Ensembl or NCBI. Accession numbers are listed at S1 Table. Alignment was carried out by ClustalW (1581 nucleotides left) and phylogenetic tree is constructed by neighbor-joining method in MEGA6. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates did no display. This tree works as the initial tree for further PAML analysis. alleles in neutral evolution. The iHS is based on different levels of linkage disequilibrium (LD) surrounding selected allele region compared to the background allele at the same position. Suggestive evidence for natural selection is defined as iHS < −1.5 or > 1.5, powerful selection is iHS < −2 or >2 [33]. The calculated fixation index (Fst) is a measure for the extent of variations in allele frequency between populations [34]. Population differentiation increased by local adaptation may result in larger Fst values [35].
For the HapMap populations, standardized iHS and Fst values were also provided by the Haplotter (http://hg-wen.uchicago.edu/selection/haplotter.htm.) [36,37] Transcription factor binding sites To uncover alleles that may change binding sites in intron three, sequences surrounding eight obesity associated SNPs (20bp up-and downstream) within FTO intron three were downloaded from UCSC genome browser (http://genome.ucsc.edu/). Comparing transcription factor binding probabilities of sequences carrying either the major or the minor alleles were performed using ConSite [38]. Sequences included in the analysis are listed in S4 Table. We particularly analyzed transcription factors specifically in vertebrates, as it is well acknowledged that functional transcription factor binding sites are conserved among close species, where substitutions occur mostly at nonfunctional positions when the evolutionary distance of species increases [39]. ConSite incorporates the datasets from JASPAR [40] and ConSite summarizing transcription factor binding profiles as well as phylogenetic footprinting algorithms for additional constraints further improving prediction algorithms. The noise level of ConSite compared to other single sequence analysis is reduced by * 85% [38]. The program has been applied to several studies [41][42][43] and has been validated in functional experiments both in vitro and in vivo [42].

Phylogenetic Analyses by Maximum Likelihood (PAML)
PAML analysis revealed that the coding region of FTO is highly conserved among all studied species (average ω = 0.1616). The LRT statistic for lineage-specificity model (M0 vs. free-ratio) was calculated as 2Δl = 63.74. Compared with a χ 2 distribution under d.f. = 66, the difference between these two models was not significant indicating that the ω is not different among lineages. This suggests no differences in the direction and magnitude of selection acting on FTO coding regions of each species [44]. The second LRT was conducted between M0 model and M3 model. The significance of the result of LRT (2Δl = 193.47, d.f. = 4 ) pointed to the M3 model. In this case, ω varied among sites within a species instead of having a constant value, and the substitution rate between non-synonymous and synonymous mutations fluctuated within FTO coding region of each species. In the last two LRTs, non-significant results were detected which suggested null neutral hypothesis (M1a and M7) [45]. In summary, positive selection cannot be inferred for any of the sites in coding sequences. All data are summarized in Table 1.

Population genetic measures
The analyses with DnaSP provided strong evidence for a non-neutral evolution of the FTO locus. Across the whole gene locus (1 Mb), Tajima's D showed significant deviations from the normal distribution of alleles (summarized in Table 2). The sliding-window analyses further supported these findings (Fig. 2). Interestingly, Tajima's D seemed to be slightly higher in the third intron than in the first intron. Furthermore, the values across the studied populations in the third intron were more consistent when compared with the first intron which showed decreased Tajima's D in Asian populations (Japanese and Chinese; Table 2 and Fig. 2). In line with Tajima's D, also Fu and Li's DÃ and Fu and Li's FÃ tests showed significant deviations from neutrality in the investigated populations (Table 2).
From the publicly available data, the Haplotter showed iHS top scores > 2 in the CEU on the FTO-region, e.g. for rs7193144 and rs8050136 (S3 Table). All SNPs in the third intron had rather low iHS-values. These results were not significant according to the published map of recent positive selection in the human genome [36]. The unstandardized iHS values calculated with the iHS-tool supported publically available data in the Haplotter (Table 3 and S3 Table). It is noteworthy that in the Sorbs, the iHS for SNPs in the third intron (rs17818902 and rs17818920) was nearly three times higher than in the CEU sample (1.468 vs. 0.590). Notably, the strength of association with BMI positively correlated with the unstandardized iHS Table 3. iHS values indicate that no long haplotype was observed for variants in FTO. The Fst values between comparisons were close to zero among variants which indicated no significant population differences (Table 3).

Transcription factor binding sites
To elucidate the potential functional mechanisms underlying the strong association of variants in FTO's third intron with BMI in the Sorbs, we investigated in silico the impact of SNPs on predicting putative transcription factor binding sites (S5 Table). As shown in S5 Table, minor alleles at variants rs17818902 showing the top association signal with BMI in the Sorbs [11] and rs8053740 predicted novel binding sites for two transcription factors TCF3 and SOX17, respectively.
Further, at rs8053367 the presence of the minor allele led to binding of multiple transcription factors, namely FREAC-2, HNF-3beta, HFH-1, HFH-2 (S5 and S6 Tables), while the binding site of transcription factor Irf-1 seemed to be significantly compromised. Finally, minor alleles of rs17818920 and rs7205213 led to alterations in binding sites for HLF, SOX17 and HNF-3beta (S5 and S6 Tables).

Discussion
Polymorphisms in the FTO gene have been shown to be associated with obesity in different ethnic groups of European and other ancestries [3][4][5]7,11,46]. Whereas associations with SNPs in the first intron have been robustly replicated, in the Sorbs, the strongest effects on BMI were found for variants in the third intron [11]. To address specific associations of FTO variants in Sorbs, we aimed to test the gene for signatures of selection in mammals and particularly in human populations. PAML-analyses on FTO coding sequence from 35 vertebrates revealed constant results with ω < 1. In the NearlyNeutral (M1a) model, most sites in the coding sequence were under strong (*80%) purifying selection or neutral mutation (*20%) and experienced a very high rate of synonymous substitutions, thus suggesting strong gene conservation. This underlines the biological importance of the gene, as functionally relevant genes are expected to be highly conserved and thus subject to purifying selection [47]. The fact that FTO is subject to purifying selection is consistent with findings of Ohashi et al. who studied the genetic architecture of FTO polymorphisms in oceanic populations [48]. However, considering purifying selection being an important means of evolution to maintain the optimized form of a gene, it cannot be excluded that FTO variants were positively selected in the past when the ability to store energy was beneficial.
It is of note that PAML analysis only examined the coding sequence, however most of the obesity-associated SNPs, like rs9939609 and rs17818920, map in the intronic regions. Therefore, test statistics such as iHS and Fst which are independent from coding regions are inevitable in evolutionary analyses. It has been stated before that at least in the oceanic populations FTO does not seem to comply with the thrifty genotype hypothesis [48]. The analyses in these populations have only considered polymorphisms in the first intron. In the context of our present data, further studies systematically targeting the FTO locus in populations of different ethnic backgrounds will be inevitable. As we show in the present study, neither iHS nor Fst values indicate positive selection for any allele from the third intron in individual groups or groups of populations (see S1 Fig.). Thus, consistent with studies in oceanic populations, our data would not support the thrifty genotype hypothesis. In contrast, other population genetic measures like Tajima's D, Fu and Li's DÃ, and Fu and Li's FÃ suggest the signature of balancing selection in FTO on a significant level. Detection of balancing selection might be explained by the fact that whereas Tajima's D considers the sites themselves in terms of allele frequencies, it does not take into account the surrounding regions of sites through addressing LD (such as iHS) [49,50]. This is interesting when considering that polymorphisms in the third intron showed the strongest association with BMI in the Sorbs from Germany [11]. Remarkably, in the first intron, Tajima's D is rather low in Asian populations when compared with European Caucasians, which might at least in part explain ethnic specificity in the genotype-phenotype associations with SNPs in this gene region. However, it has to be noted that Tajima's D was consistent across studied populations in the third intron, which does not seem to support a population specific pattern of selection for the Sorbs. Rather than that, the specific association of FTO variants in the third intron with BMI in the Sorbs is more likely to be explained by specific environmental factors interacting with the genetic background in the Sorbs. Given the fact that the strongest effects on BMI in the Sorbs is on the third intron, rs17818902, we also investigated its potential impact on the transcription factor binding sites. In silico analyses using publically available transcription factor databases suggested that the minor rs17818902 allele would predict a novel binding site for TCF3 and that of rs8053740 for SOX17. TCF3 acts as a transcriptional regulator involved in the initiation of neuronal differentiation [51,52] whereas SOX17 is an important player in the regulation of embryonic development and in the determination of the cell fate [53]. However, the causal functional variant remains to be discovered. Thus, studies on pathways downstream of TCF3 may pave the path for better understanding the mechanism underlying associations of FTO with obesity. Nevertheless, it has to be acknowledged that a recent study strongly suggested a direct interaction of noncoding regions in the first intron of FTO showed enhancer activity with the promoter of the homeobox gene IRX3 thus regulating IRX3 expression [54]. However, the clear association connecting functional variants in the first intron and obesity remains vague. Secondly, the experiments of loss of function on IRX3 were conducted in human cerebellum [55]. In contrast, there is strong evidence for the role of FTO in the complex pathophysiology of obesity (systematically reviewed in [56]). For example, it has been showed that the highest expression of FTO is in the brain region controlling food intake [8] and that hypothalamic-specific manipulation of Fto affects food intake in rats [57].
In conclusion, population genetic analyses revealed balancing signatures of selection at the FTO locus with a prominent signal in the third intron, a genomic region with strong