Fine Mapping of a GWAS-Derived Obesity Candidate Region on Chromosome 16p11.2

Introduction Large-scale genome-wide association studies (GWASs) have identified 97 chromosomal loci associated with increased body mass index in population-based studies on adults. One of these SNPs, rs7359397, tags a large region (approx. 1MB) with high linkage disequilibrium (r²>0.7), which comprises five genes (SH2B1, APOBR, sulfotransferases: SULT1A1 and SULT1A2, TUFM). We had previously described a rare mutation in SH2B1 solely identified in extremely obese individuals but not in lean controls. Methods The coding regions of the genes APOBR, SULT1A1, SULT1A2, and TUFM were screened for mutations (dHPLC, SSCP, Sanger re-sequencing) in 95 extremely obese children and adolescents. Detected non-synonymous variants were genotyped (TaqMan SNP Genotyping, MALDI TOF, PCR-RFLP) in independent large study groups (up to 3,210 extremely obese/overweight cases, 485 lean controls and 615 obesity trios). In silico tools were used for the prediction of potential functional effects of detected variants. Results Except for TUFM we detected non-synonymous variants in all screened genes. Two polymorphisms rs180743 (APOBR p.Pro428Ala) and rs3833080 (APOBR p.Gly369_Asp370del9) showed nominal association to (extreme) obesity (uncorrected p = 0.003 and p = 0.002, respectively). In silico analyses predicted a functional implication for rs180743 (APOBR p.Pro428Ala). Both APOBR variants are located in the repetitive region with unknown function. Conclusion Variants in APOBR contributed as strongly as variants in SH2B1 to the association with extreme obesity in the chromosomal region chr16p11.2. In silico analyses implied no functional effect of several of the detected variants. Further in vitro or in vivo analyses on the functional implications of the obesity associated variants are warranted.

The most plausible obesity gene in the region is SH2B1. SH2B1 is a mediator of energy homeostasis and increases leptin and insulin potency in downstream signaling pathways [22]. Sh2b1 knockout mice are obese, hyperphagic and exhibit traits of the metabolic syndrome like hyperlipidemia, leptin resistance, hyperglycemia, and insulin resistance [23]. In SH2B1 we detected a rare mutation solely in obese individuals; additionally we replicated the obesity association of the GWAS SNP rs7498665 (SH2B1: p.Thr484Ala; [14]). Another group also described SH2B1 mutations in extremely obese children with insulin resistance [24,25]. As the infrequent mutations cannot explain the genome-wide association signal and so far, no functional effect of the frequent SNP has been detected [14,25], we analysed additional promising obesity candidate genes in the same chromosomal region.
Lead SNPs in GWAS can tag large regions of high linkage disequilibrium (LD) which can comprise one to several genes/variants that are relevant for the analyzed phenotype [26]. For the region on chr16p11.2, Speliotes et al. [2] listed non-synonymous SNPs in adjacent genes (SH2B1, APOBR, SULT1A2) in high LD (r 2 >0.75) with the lead SNP rs7359397 ( [2], see supplementary material). Additionally for a total of four adjacent genes, involvement in weight regulation seems likely because they are either (a) biological candidates (SH2B1), or differentially expressed in adipose tissue between general population and patients who underwent bariatric surgery (SH2B1, SULT1A1, SULT1A2, TUFM; [1]). Further fine mapping of the chromosomal region for causal variants that contribute to the obesity association has not yet been undergone.
The APOBR gene encodes a macrophage receptor that regulates fat and vitamin uptake into cells [27,28]. Non-synonymous variants (rs180743: p.Pro428Ala and rs3833080: p.Gly369_Asp370del9) in APOBR are associated with hypercholesterolemia [29]. Mice fed a high fat diet until they became obese showed increased expression of APOBR and hence increased lipid intake in macrophages of the adipose tissue [28]. This is mediated by the transcription factors PPARα, PPARβ/δ, PPARγ and the PPAR-RXR transcriptional complex [30,31]. In normal weight humans, a single meal with high fat content (72% of the total energy of the meal) increased both APOBR expression and lipid uptake in monocytes [32]. High blood lipid levels differentially regulate APOBR expression in human postprandial monocytes and macrophages and lead to foam cell formation [33].
The sulfotransferase genes SULT1A1 and SULT1A2 are located close to each other on chr16p11.2. Both proteins sulfonate hormones like estrogens, estrogenic alkylphenols, 17-βestradiol and several androgens, so that the hormones can be excreted [34]. Obesity is associated with increased levels of 17-β-estradiol, estron and estron sulfate which are substrates of SULT1A2 [35][36][37]. Childhood obesity is associated with an increased risk of adult obesity and Type 2 Diabetes mellitus [38]. Association of a non-synonymous SNP (rs141581853: SULT1A1 p.Arg213His) with obesity but not hypertension had been described [39]. The regulation of SULT1A1 expression in diet induced obesity (DIO) rats in adipose tissue and liver was dependent on the dietary fat content [40].
TUFM encodes a transcription factor for mitochondrial gene expression [41]. (1) Exclusive maternal inheritance of mitochondria and mitochondrial DNA, (2) stronger correlation with maternal than paternal BMI [42] and (3) the relevance of mitochondria for energy metabolism indicate that genes involved in mitochondrial function are relevant candidate genes for weight regulation. TUFM expression is up regulated in DIO rats on a high fat diet [40]. In human cultured hippocampal neurons, BDNF stimulation down regulated the expression of TUFM [43].
In order to fine map the chromosomal region chr16p11.2 for further obesity associated variants, we screened the coding regions of APOBR, SULT1A1, SULT1A2, and TUFM for variants in 95 extremely obese children and adolescents. Most of these individuals were enriched for the likely presence of mutations in high LD with the original obesity association signal [2,14]. Our focus was the detection of common to infrequent variants (MAF > 0.01) which affect the protein sequence. Previously it was shown, for instance for the MC4R, that GWAS results point to genes in which functionally relevant mutations are found more frequently in cases than in controls [44]. These infrequent mutations might also have a major gene effect. Although synthetic association does not seem to be a frequent mechanism [45], GWAS results and mutation screens frequently depict the same genes. Subsequently, we confirmed the detected nonsynonymous variants in independent study groups.

Study groups
An overview of the study groups can be found in Table 1 (see also [14]), details of recruitment have been described previously [46,47]. We included children and adolescents (mean age = 13.25 ± 3.26 years) with a BMI above the 97th BMI percentile. Written

Mutation screen
The selection of extremely obese individuals for the mutation screen was based on genotypes at SNP rs2008514 (proxy of rs7359397; [2]) in the chromosomal region 16p11.2. In total, we analyzed 95 extremely obese individuals, 90 of whom were likely enriched for the presence of infrequent mutations at chr16p11.2 that contribute to the association signal of rs2008514. These extremely obese patients (offspring) from the family-based GWAS sample were homozygous for the obesity risk allele T at rs2008514 and had at least one heterozygous parent, thus substantially contributing to the observed over-transmission of the rs2008514 T-allele in our previous study [20]. The other five individuals harbor a deletion on chr16p11.2 that does not include the genes which were screened for mutations here. The 95 individuals were screened for mutations in APOBR (NM ID: 55911, chr16:28,505,970-28,510,291), SULT1A1 (NM ID: 6817, chr16:28,617,142-28,620,176), SULT1A2 (NM ID: 6799, chr16:28,603,349-28,607,251) and TUFM (NM ID: 7284, chr16:28,854,296-28,857,590, positions given for GRCh37/hg19). All primers can be found in (S3 Table).
Depending on the size of the screened fragment, one of the following two methods was used for the mutation screen of the coding region of each gene as described previously [14,48]: We used single stranded conformation polymorphism analyses for PCR amplicons up to 300bp [49] or denaturing high-performance liquid chromatography for PCR amplicons up to 600bp [50,51]. Using these fragment sizes both methods achieve a high sensitivity (below 5% error rate [49,50]) which is very well compatible with Sanger sequencing [52,53]. All PCR amplicons with dHPLC/SSCP patterns deviant from the wild-type pattern were re-sequenced as described previously [14]. At least two experienced individuals independently assigned the deviant patterns; discrepancies were solved either by reaching consensus or by re-screening.

Genotyping
The non-synonymous variants identified in the mutation screens in SULT1A1, and SULT1A2 were genotyped in 355 obesity families [46] by MALDI TOF, RFLP and tetra ARMS PCR ( Fig  1). The missense variants in APOBR (rs180743: p.Pro428Ala; rs3833080: p. Gly369_Asp370-del9; rs368546180: p.Thr321_Gly329del9) were genotyped in the following independent study groups by either gel electrophoresis of PCR products (for deletions and insertions) or TaqMan assay (detailed information can be obtained from the authors): 615 obesity trios (extremely obese child or adolescent with both biological parents; [20]) and the case-control GWAS study groups (453 extremely obese cases and 435 lean controls described in [46,47]; see above) and 1,383 obese and overweight children and adolescents (Datteln Paediatric Obese Cohort [54]). At least two experienced individuals independently assigned the genotypes; discrepancies were solved either by reaching consensus or by re-genotyping. In case of the trios, Mendelian Mutation screen sample: part of the family-based and the case-control GWAS samples' cases (90 extremely obese index patients from the 705 familybased GWAS trios; 5 extremely obese patients from the case-control GWAS). Family-based GWAS sample: 615 index patients with extreme obesity and their biological parents; independent of initial screening sample [14,46]. The 355 trios used for association analysis was part of this sample but did not deviate significantly from the description given for the overall sample. Case-control GWAS sample: GWAS of extremely obese children and adolescents in comparison to lean, adult controls; independent of initial screening sample [14,47]. DAPOC: Datteln Paediatric Obese Cohort: Sample of overweight and obese children and adolescents; independent of initial screening sample [54].  inheritance was checked. For the other study groups, Hardy Weinberg equilibrium was assessed and fulfilled. All enzymes and protocols can be obtained from the authors.

Statistics
For association studies in the above-mentioned 453 cases and 435 controls Fisher's exact test (allelic association) was calculated with PLINK [55] adjusted for sex and age of the individuals. In the 615 trios, an asymptotic, 2-tailed p-value for the transmission disequilibrium test (TDT [56]) was calculated with PLINK. The initial screening sample was excluded from the further analyses. All p-values are asymptotic, two-sided and not corrected for multiple testing unless stated otherwise. Additionally to univariate analysis, we conducted a joint analysis of the significant SNPs to reveal if these SNPs descend from the same signal. This was done with R 3.1.0 [57] without any further adjustment.
The non-synonymous, non-conservative SNP rs180743: p.Pro428Ala is located closely to the deletion rs3833080 which is in high LD with the variant (r 2 = 0.98). Due to the high LD with rs2008514, both variants were carried, at least heterozygously, in all individuals of the initial screen. Functional in silico prediction for the risk allele is variable, although the SNP is located in a conserved position (conservation 66% among 29 species, ENSEMBL). SIFT and PolyPhen2 rated this SNP as deleterious, while PANTHER, SNAP and PMUT rated it as neutral (Table 2). Although the overall prediction is "Polymorphism", Mutation Taster predicted the introduction of a new splice site, thereby disruption of a glutamate-rich region and potential loss of a phosphoserine domain ( Table 2).
The deletion rs368546180 with a length of 27bp (c.933_934insdel27; p.Ala328_Gly329) results in an in frame shortened amino acid sequence of the repeat region of APOBR with no predicted function [27]. The in silico prediction of the variant is ambiguous (Mutation Taster "Polymorphism", SIFT "neutral", PROVEAN "deleterious"; Table 2). It was identified once heterozygously in an extremely obese child. The female mutation carrier (height 147 cm, weight 49 kg, BMI 22.68 kg/m 2 , BMI SDS 1.44, age 10.5 years, 93 rd age and sex specific BMI percentile) inherited the mutation from her obese mother (BMI 39.06 kg/m 2 ) while the overweight (BMI 25.65 kg/m 2 ) father did not harbor the deletion. The girl also homozygously carried the risk alleles (minor alleles) at rs180743: p.Pro428Ala and rs3833080: p.Gly369_Asp370del9. Case-control association analyses based on 1,873 extremely obese cases and 435 lean controls were performed for the two frequent, coding variants rs180743: p.Pro428Ala and rs3833080: p.Gly369_Asp370del9 and the second infrequent deletion rs368546180: p. Thr321_Gly329del9. The APOBR rs180743 G-allele was nominally associated with obesity (odds ratio (OR) per allele = 1.27; 95% confidence interval (CI): 1.09-1.47, p = 0.002, see Table 3). Similarly, the deletion allele of rs3833080 was nominally associated with obesity (OR = 1.25 per allele; 95%CI: 1.08-1.45, p = 0.003). Family-based association studies (based on 615 obesity trios) for rs3833080 and rs180743 confirmed these associations (Table 3). While genotyping rs3833080 we also observed the insertion allele (p.Gly369_Asp370insGluGluAla-GlyThrAlaSerGlyGly), with a much lower frequency and exclusively in (extremely) obese cases (minor allele frequency of 0.001 in 2,540 cases) but not in 481 lean or normal weight controls. The second deletion rs368546180: p.Thr321_Gly329del9 was only observed once in all screened individuals. Consequently, association analysis could not be performed ( Table 3).

SULT1A1
Of the initially detected variants (7 non-synonymous, S2 Data) in the coding region of SULT1A1 in 95 extremely obese children and adolescents, none could be confirmed with an independent method. A high sequence similarity between the SULT1A gene family allowed for unspecific amplification of several SULT1A genes for individuals in the mutation screen resulting in artifacts. Hence, apart from SULT1A1 we also amplified one or more SULT1 gene family members. So that a variant that seemed to be located in SULT1A1 was in fact attributable to SULT1A2, where it represents the wild type allele (see S2 Data and S2 Table for more detail). For the variant Met1Val, which could not be explained by one of the other SULT1A family members, two independent genotyping methods could not replicate our initial uni-directional Sanger-resequencing finding. We hence deemed this variant an artifact.
For the synonymous variants in SULT1A1, in silico predictions varied. All variants were predicted to change splicing enhancer and silencers, or to directly affect splice sites or transcription factor binding sites (S1 Table). Particularly for the variant p.Pro200 = (rs3176926), both changes in the binding domains of splicing regulators as well for the transcription factor AML-1 were predicted.
For the non-conservative, non-synonymous SNP rs1136703: p.Ile7Thr (c.20T/C), in silico analyses did not predict a functional effect (Table 2). For rs10797300: p.Pro19Leu (c.56C/T), in Fine-Mapping of Obesity Region chr16p11.2 Table 3. Association analyses of polymorphism of detected non-synonymous variants in chr16p11.2 (screened genes APOBR, SH2B1, SULT1A1, and SULT1A2) in extremely obese children and adolescents and lean controls.  silico programs predicted a functional modification (PolyPhen2 and PANTHER, Table 2). The conservative non-synonymous polymorphism rs145008170: p.Ser44Asn (c.131G/A) is located close to Lys48, which is relevant for binding of the xenobiotic p-nitrophenol to the binding pocket of SULT1A2. In silico analyses predicted a "non-neutral" (SNAP) or "deleterious" (PANTHER; Table 2) functional change. The infrequent missense variants rs4987024: p. Tyr62Phe (c.185A/T) and rs142241142: p.Ala164Val (c.491C/T) are non-conservative amino acid exchanges which are not located close to the binding pocket of SULT1A2. In silico, a higher probability of functional changes was predicted for p.Tyr62Phe than for p.Ala164Val (Poly Phen 2 "probably damaging"), although the analyses revealed mixed results (Table 2). In silico prediction mostly interpreted the conservative SNP rs1059491: p.Asn235Thr (c.704A/C) as functionally relevant ("probably damaging" PolyPhen2, "non-neutral" SNAP and "deleterious" SIFT; Table 2). None of the analyzed non-synonymous SNPs in SULT1A2 showed association with obesity in 355 obesity trios (TDT; Table 3).

TUFM
In TUFM, the previously unknown variant c.3536C>G was detected in the 3' untranslated region (previously unknown). In addition, non-coding intronic variants were detected (rs7187776: c.-55T>C, rs4788099: c.817+13T>C, rs8061877: c.248−18G>A, and rs61737565: c.922+29C>G). All five variants in TUFM are predicted to lead to altered splicing regulator or transcription factor binding sites (S1 Table). For rs8061877, the disruption of a transcription factor binding site was predicted (TFSearch and Consite), although for different transcription factors (SRY, HFH 2, HMG-IY; S1 Table). For the previously unknown variant g.28854194C/G, in silico prediction showed a change in splice enhancer binding sites and splicing silencer sites by three programs (S1 Table). Also, alterations in splice sites for this variant were predicted (Mutation Taster) despite the variant being in the non-coding 3' UTR of TUFM.

Discussion
Previous studies on the causal variation underlying the obesity association of chr16p11.2 mainly focused on the SH2B1 gene. Mutation screens in humans [14,24,25] have revealed a number of mutations that are too infrequent to explain the genome-wide association of the lead SNP with BMI and obesity. The coding variant rs7498665 (SH2B1: p.Thr484Ala) was identified in GWAS studies as lead obesity association signal for a linkage disequilibrium block encompassing 1 Mb. However, this SNP showed no functional effect on STAT3 mediated leptin signaling [14] or the phosphorylation of JAK1 or IRS1 in insulin signaling [25]. Hence, variants underlying the genome-wide significant finding may be located outside the SH2B1 coding region, but in high LD with the original association signal as proposed by Speliotes et al. [2]. We therefore screened the coding region of APOBR, SULT1A1, SULT1A2 and TUFM for mutations in 95 extremely obese German children and adolescents.
We identified 13 variants in the APOBR coding region, three of which were nonsynonymous or deletions. These (rs180743: p.Pro428Ala, rs3833080: p.Gly369_Asp370del9, and rs368546180 p.Thr321_Gly329del9) were genotyped in our trios and case-control study groups. The variants p.Pro428Ala and p.Gly369_Asp370del9 are located close to each other; their LD is high (r 2 = 0.98) and their minor alleles are associated with obesity in our sample  Table 3) As the LD and proximity of both polymorphisms with the initial lead SNP and the non-synonymous polymorphism rs7498665 (SH2B1: p.Thr484Ala) is very high, the obesity association of all variants are dependent signals. Conditional analysis increased all p-values for rs180743, rs3833080, and rs7498665 above 0.7, indicating high signal dependency. Previously association of the minor alleles of both variants with hypercholesterolemia had been described [29]. Of note, the hypercholesterolemia and obesity risk allele C of rs180743 was not associated with weight loss during a 1 year lifestyle intervention in children and adolescents [69] although the position of the SNP is conserved (conservation 66% over 29 species, ENSEMBL). The second deletion (rs368546180: p.Thr321_Gly329del9) was only detected once in our sample of 2,179 obese cases and 435 normal weight or lean controls. Hence, obesity association assessment was not possible. Although variant rs368546180 is in-frame and located in the repeat region of APOBR which has no predicted function [27], in silico prediction of the variant implicated a potentially reduced function (Table 2).
Almost all initially screened individuals are homozygous carriers of obesity risk alleles at the rs180743 (p.Pro428Ala) and rs3833080 (p.Gly369_Asp370del9) polymorphisms. Our results suggest that an in vitro functional validation of both deletion rs3833080 and SNP rs180743 would be of interest. Brown et al [27] suggested a contribution to binding of the specific ligand apoB48 of the repeat region in which both variants are located, with alterations possibly leading to reduced uptake of chylomicrons (CMs) or CM remnants. This hypothesis could be tested by lipoproteinuptake assays or ligand blotting, as suggested by Daniel et al. [70] and Brown et al. [27].
In SULT1A2, none of the variants was associated with obesity (Table 3), so they do not contribute to our initial TDT finding although some of the variants were predicted to have functional effects. Of the detected missense variants, several have known in vitro functional effects on xenobiotic sulfonation, e.g. rs4149404 (p.Ile7Thr), rs10797300 (p.Pro19Ser), and rs1059491 (p.Asn235Thr [71]). Glatt et al. [39] reported an obesity association for the minor allele of SULT1A1 rs141581853 p.Arg213His in SULT1A1; the variant was not detected in our screen.
In contrast to SULT1A1 which shows ubiquitous expression [39], the expression of SULT1A2 is limited to liver, blood platelets, heart, brain and skin. Both sulfotransferases share their substrates [72]. Even if the detected variants entail biological functional changes at the protein level, other sulfotransferases can most likely compensate for the function of each other in vivo when only one sulfotransferase is affected [73].
In addition to these variants which directly affect the amino acid composition of the proteins, recent studies detected a cis-regulatory element (intronic SNP rs4788099 in SH2B1) which affects the expression of nearby genes (TUFM, coiled-coil domain containing 101 gene: CCDC101, Homo sapiens spinster homolog 1 gene: SPNS1, SULT1A1 and sulfotransferase family, cytosolic, 1A, phenolpreferring, member 4 gene: SULT1A4) in B cells and monocytes [74]. In rodents, differential regulation of the central nervous expression of several genes on chr16p11.2 was shown in reaction to high caloric diets [40,75]. The GWAS lead SNP rs7359397 also affects the expression of SULT1A1, SPNS1 and TUFM, but not SH2B1 in cis. Although the SNP alone only explained 0.0086% of the genetic variance of BMI, the expression changes elicited in the three genes raise this number to 0.5% [76]. These regulatory effects could also contribute to the BMI association signal at chr 16p11.2 in GWAS [2,77,78].

Conclusion
In sum, of the five analyzed genes SH2B1 and APOBR comprised non-synonymous variants associated with obesity. These variants had a medium to high minor allele frequency and were Fine-Mapping of Obesity Region chr16p11.2 thus previously identified in larger cohorts and population based samples (i.e. 1000genomes and exome variant server). Low frequency variants with potentially major gene effects for weight regulation besides SH2B1 p.βThr656Ile/γPro674Ser [14] were not detected.
Supporting Information S1 Table. In silico functional prediction of all detected variants in chr16p11.2 (screened genes APOBR, SULT1A1, and SULT1A2). were aligned using the program T-Coffee (http://www.ebi.ac.uk/Tools/msa/tcoffee/). Here, an asterisk below the amino acid alignment marks complete sequence identity, while dots below the sequence mark amino acids that share similar side chains or charges. The positions of missense variants detected in the mutation screens are marked in yellow, green marks the variants that could not be verified with independent methods, red marks the variants rs35728980 and rs1059491 which encode Asn235Thr in both SULT1A1 and SULT1A2, respectively. (DOCX) S3 Fig. Mutated positions in the obesity candidate genes of chr16p11.2 and regional overview. The first picture shows the whole chromosomal region 16p11.2 is displayed with the genes with non-synonymous variants described in this manuscript marked with vertical lines. The underlying picture shows the obesity association signals of SNPs in the GIANT collective (Speliotes et al. 2010). After the whole chromosomal region for reference, the screened genes are depicted with the detected mutations and MAF in CEU according to dbSNP (http://www. ncbi.nlm.nih.gov/projects/SNP/). Here, the horizontal lines symbolize the introns while exons are marked with bars. Functional domains are also included in the graphs. The positions of the