Genome-Wide Association Study Identifies Two Novel Regions at 11p15.5-p13 and 1p31 with Major Impact on Acute-Phase Serum Amyloid A

Elevated levels of acute-phase serum amyloid A (A-SAA) cause amyloidosis and are a risk factor for atherosclerosis and its clinical complications, type 2 diabetes, as well as various malignancies. To investigate the genetic basis of A-SAA levels, we conducted the first genome-wide association study on baseline A-SAA concentrations in three population-based studies (KORA, TwinsUK, Sorbs) and one prospective case cohort study (LURIC), including a total of 4,212 participants of European descent, and identified two novel genetic susceptibility regions at 11p15.5-p13 and 1p31. The region at 11p15.5-p13 (rs4150642; p = 3.20×10−111) contains serum amyloid A1 (SAA1) and the adjacent general transcription factor 2 H1 (GTF2H1), Hermansky-Pudlak Syndrome 5 (HPS5), lactate dehydrogenase A (LDHA), and lactate dehydrogenase C (LDHC). This region explains 10.84% of the total variation of A-SAA levels in our data, which makes up 18.37% of the total estimated heritability. The second region encloses the leptin receptor (LEPR) gene at 1p31 (rs12753193; p = 1.22×10−11) and has been found to be associated with CRP and fibrinogen in previous studies. Our findings demonstrate a key role of the 11p15.5-p13 region in the regulation of baseline A-SAA levels and provide confirmative evidence of the importance of the 1p31 region for inflammatory processes and the close interplay between A-SAA, leptin, and other acute-phase proteins.


Introduction
Serum amyloid A (SAA) is a sensitive marker of the acute inflammatory state.Its isoforms are expressed constitutively (C-SAA) and show a rapid (up to 1000-fold) increased expression in response to inflammatory stimuli such as trauma, infection, injury, and stress during the acute phase (A-SAA) [1].The high inductive capacity along with a high conservation of genes and proteins throughout evolution of vertebrates and invertebrates suggests that A-SAA plays a key role in pathogen defence and probably functions as an immune-effector molecule [1].Acute inflammation has mainly beneficial effects in restoring homeostasis.However, in recent years, clinical and epidemiological studies have gathered substantial evidence that A-SAA is associated with obesity [2] and that prolonged and recurrent chronic infection as well as inflammation is causally involved in the pathogeneses of amyloidosis [1].Furthermore, it induces, promotes, or influences susceptibility to several chronic diseases such as atherosclerosis and its clinical complications [3][4][5][6][7][8][9], type 2 diabetes [10,11], and various malignancies [12].The SAA gene family is located within 150 kb at chromosome 11 and comprises of four genes: SAA1 and SAA2, the bona fide acute-phase SAA isoforms, SAA3, a pseudogene in humans, and SAA4, a low level expressed gene coding for C-SAA [13,14].A-SAA expression is regulated by a variety of stimuli, including the pro-inflammatory cytokines TNF-a and IL-6, as well as glucocorticoids [15,16].Like other acute-phase proteins, A-SAA is expressed primarily by the liver [17].However, extra-hepatic expression has been reported for different cell lines like epithelial cells, monocyte and macrophage cells, most endothelial cells, adipocytes, atherosclerotic lesions, and smooth muscle cells [17].Twin studies suggest a substantial genetic contribution to baseline A-SAA concentrations with heritability estimates of 59% (95% confidence interval, 49-67%) [18].The identification of genetic variants that are predisposed to elevated levels of A-SAA could provide important clues to the immune response pathways involved in the regulation of A-SAA levels which might also be of relevance for related clinical entities.In the past, association analyses between genetic variants and A-SAA levels were limited and restricted to allelic variants of SAA genes and protein concentrations [19][20][21].
We therefore conducted the first genome-wide association study on baseline A-SAA concentrations.In a meta-analysis of four genome-wide scans (KORA S4, LURIC, TwinsUK and Sorbs) we included 4,212 participants of European descent.Additionally, in order to account for known gender-specific differences in the regulation of A-SAA [22,23] we stratified the analysis by gender.

Results
In the present meta-analysis of four genome wide scans 106 SNPs distributed across two regions showed genome-wide significant associations with p-values below the threshold of 5610 28 (Figure 1, Table S1).Table 1 shows study specific results for the top hits within the two regions and three identified subregions (see below) of the meta-analysis as well as an additional region for men in the gender stratified analysis.Genotypic mean levels are provided in Table S2.
Results of the single genome-wide studies were consistent across all four studies regarding the direction and magnitude of the effects.In addition, results were consistent between different genotyping technologies (Table S3).No deviations from the Hardy-Weinberg-Equilibrium were observed.The variable of inter-study heterogeneity (I 2 ) showed homogeneity at the 1p31 locus.At the 11p15.5-p13locus we observed I 2 values that indicated a more distinct heterogeneity.This reflects the relatively large and varying beta values and differences in the minor allele frequency (Table S1).However, taking into account that this locus was clearly significantly associated with A-SAA in all studies included in the meta-analysis, results of the meta-analysis are reported based on a fixed effect model.
The first region (193.3kb of length) resides at 11p15.5-p13 and includes SAA1 one of the structure genes of A-SAA.Within this region the strongest association was found for two highly correlated intronic polymorphisms of the general transcription factor 2 H1 (GTF2H1) gene, rs4150642 (p = 3.20610 2111 ) and rs7103375 (p = 3.26610 2111 ) (Figure 2A).These two top hits show modest correlation (r 2 #0.376) with other significantly associated SNPs within this region.When the structure of correlation and explained variances within the region were analysed three mostly independent subregions were identified (Table S4, Figure 2B-2D).The first subregion encloses the 59 end of SAA1 (Figure 2B) with strongest association for rs4638289 (p = 2.77610 253 ).The other two subregions harbour the genes Hermansky-Pudlak Syndrome 5 (HPS5) and GTF2H1 (Figure 2C) and lactate dehydrogenase A and C (LDHA and LDHC) (Figure 2D) with strongest associations for rs4353250 (p = 1.68610 251 ) and rs2896526 (p = 4.12610 222 ), two intronic polymorphisms of HPS5 and LDHA, respectively.
The second region was detected at 1p31 (Figure 2E).All 38 significantly associated variants cluster around the 39 end of the leptin receptor gene (LEPR).The most significantly associated SNP, rs12753193, (p = 1.22610 211 ) is located downstream of LEPR.
All associations were consistent in the KORA S4 validation analyses (Table 1 for the top hits, Table S5 for all SNPs selected for validation).
The entire regression model (including the top SNPs of the two genomic regions (rs4150642 and rs12753193), age, gender and BMI) explains 19.32% of the total variation of A-SAA in our data.With an explained variance of 10.84% for the top SNP (rs4150642) of the 11p15.5-p13locus (5.57% for rs4638289 of the SAA1 subregion, 5.34% for rs4353250 of the HPS5/GTF2H1 subregion, and 2.37% for rs2896526 of the LDHA/LDHC subregion; Table S4) and 0.93% for the top SNP (rs12753193) of the 1p31 locus the identified genomic regions account for a major part of such variance.
When the analysis was stratified by gender, one additional SNP (rs549485) located about 350 kb apart from the SAA1 subregion at 11p14 in the secretion regulating guanine nucleotide exchange factor (SERGEF) gene showed a borderline significant association with A-SAA levels in men (p = 2.76610 28 ) in the meta-analysis.In the validation analysis the association between two highly correlated SNPs within this region (rs493767 and rs550659, r 2 = 0.961) and A-SAA levels was also borderline significant (p = 8.50610 23 and p = 1.65610 22 , respectively).No significant differences between men and women were found within the regions identified in the overall meta-analysis (data not shown).

Discussion
Based on a meta-analysis of four genome-wide association studies including 4,212 participants of European descent two novel genetic susceptibility regions were identified to be associated with baseline A-SAA concentrations.With 11.68% explained variance in our data, which makes up 19.76% of the total estimated heritability of 59%, these two regions seem to have a major impact on baseline A-SAA concentrations.The region at 11p15.5-p13 accounts for most of the explained variance.Its SAA1 subregion contains part of a highly conserved region between the two bona fide acute-phase structure genes SAA1 and SAA2, which consist of at least 5 and 2 allelic variants, respectively [1,24].These two genes are concurrently induced during the acute-phase [1], and cluster within 18 kb of each other in a head to head arrangement [25].This study is the first presenting the complex genetic architecture of A-SAA levels at this locus.In the identified region, there has been evidence of regulatory elements like C/EBPalpha and C/ EBPbeta (http://genome.ucsc.edu),which are necessary for the full responsiveness to IL-1b and IL-6 either alone or in combination [1].Our finding underlines the high functional potential for this region.
The adjacent GTF2H1 is a basal transcription factor involved in nucleotide excision repair of DNA and RNA transcription by RNA polymerase II [26].HPS5 encodes a protein, which is probably involved in organelle biogenesis associated with melanosomes and platelet dense granule, its mutations lead to a homonymous clinical entity [27].And LDHA and LDHC, which are expressed in muscle tissue and in testes, respectively, encode for lactate dehydrogenase, an enzyme that catalyzes the interconversion of lactate and pyruvate [28].
Variants of the GTF2H1 gene have been recently found to be associated with lung cancer in a Chinese population [29].Furthermore, it was demonstrated that LDHA is involved in tumour genecity and its reduction causes bioenergetic and oxidative stress leading to cell death [30][31][32].Finally, Kosolowski et al. [33] found LDHC to be expressed in several types of tumour cell lines.It is thought, that recurrent or persistent chronic inflammation may play a role in carcinogenesis by causing DNA damage, inciting tissue reparative proliferation and/or by creating an environment that is enriched with tumour-promoting cytokines and growth factors [12].Furthermore, SAA synthesis could be found in human carcinoma metastases and cancer cell lines [17].
As the approach taken in this study is observational in nature it is not possible to draw causal inferences.For that reason, it could be possible that not genes, but small regulatory elements may be responsible for the findings.This is most likely the case as the identified region contains one structure gene and the adjacent region.In any case, the major impact on baseline A-SAA concentrations demonstrates a key role of the 11p15.5-p13region in the regulation of inflammation.Therefore, the identification of causal variants and their impact on diseases related to elevated baseline A-SAA concentrations represent promising targets for future functional and epidemiological studies.
The second region was found on chromosome 1p31, harbouring the LEPR gene locus.Leptin, an important circulating signal for the regulation of body weight, was found to be correlated with SAA concentrations independently of BMI, and both were expressed in adipose tissue [34].In the KORA F3 study (Text S1) a moderate but significant correlation was found between circulating A-SAA and leptin concentrations in blood in 181

Author Summary
An elevated level of acute-phase serum amyloid A (A-SAA), a sensitive marker of the acute inflammatory state with high heritability estimates, causes amyloidosis and is a risk factor for atherosclerosis and its clinical complications, type 2 diabetes, as well as various malignancies.This study describes the first genome-wide association study on baseline A-SAA concentrations.In a meta-analysis of four genome-wide scans totalling 4,212 participants of European descent, we identified two novel genetic susceptibility regions on chromosomes 11 and 1 to be associated with baseline A-SAA concentrations.The chromosome 11 region contains the serum amyloid A1 gene and the adjacent genes and explains a high percentage of the total estimated heritability.The chromosome 1 region is a known genetic susceptibility region for inflammation.Taken together, we identified one region, which seems to be of key importance in the regulation of A-SAA levels and represents a novel potential target for the investigation of related clinical entities.In addition, our findings indicate a close interplay between A-SAA and other inflammatory proteins, as well as a larger role of a known genetic susceptibility region for inflammatory processes as it has been assumed in the past.
participants with measurements of both proteins (Spearman correlation = 0.25, p = 7610 24 ).So far it is unclear whether leptin influences SAA expression directly or via the leptin stimulated cytokines, IL-6 and TNF-a [34].LEPR is a single transmembrane receptor of the cytokine receptor family most related to the gp130 signal-transducing component of the IL-6 receptor, the granulocyte colony-stimulating factor (GCSF) receptor, and the Leukaemia Inhibitory Factor (LIF) receptor, all of which are thought to play an essential role in the inflammatory process [35,36].Previous studies have provided evidence of an association of the LEPR gene locus with CRP and fibrinogen [37][38][39][40], which were both correlated with A-SAA in the KORA S4 study (Text S1) (CRP: Spearman correlation = 0.58, p = 3.22610 2155 , and fibrinogen: Spearman correlation = 0.31, p = 3.89610 241 ; N = 1734).The finding gives confirmative evidence of the importance of the LEPR gene locus for inflammatory processes and the close relationship between leptin, A-SAA, CRP and fibrinogen.
Furthermore, in the gender stratified analysis one region containing SERGEF was identified to be presumably associated with A-SAA in men.The adjacency of this identified region to the SAA gene family suggests that regulatory elements may be responsible for this signal.However, the association with A-SAA levels was only borderline significant in our study and therefore awaits replication.Two limitations of our study have to be mentioned.Firstly, due to the restrictions in laboratory methods our analyses were confined to the A-SAA isoforms and did not capture the constitutively expressed C-SAA isoform which might also be of interest, especially when analyzing baseline SAA levels.Secondly, the number of studies with genome-wide data and measured A-SAA levels was limited compared to other genome-wide association studies.Nevertheless, the study had enough power to detect two novel genetic susceptibility regions for A-SAA which explain 19.76% of the total estimated heritability already.Furthermore, results were consistent across all four studies and within different genotyping platforms, the regions are biologically highly plausible, and the results may contribute to future research on the regulation of inflammatory response and its role in related clinical entities.
Taken together, the present meta-analysis is the first whole genome approach to identify genetic variants that are associated with baseline A-SAA concentrations.Two novel genetic susceptibility regions were identified to be associated with baseline A-SAA concentrations.The findings demonstrate a major impact of the 11p15.5-p13gene region on the regulation of inflammation and suggest a close interplay between leptin, A-SAA, and other acute-phase proteins as well as a larger role of the LEPR gene locus in inflammatory processes as it has been assumed in the past.

Participating studies
The present meta-analysis combined data from four genomewide scans: one survey of the Cooperative Health Research in the Region of Augsburg (KORA S4), the Ludwigshafen Risk and Cardiovascular Health study (LURIC), the UK Adult Twin Register (TwinsUK) and a self-contained population from Eastern Germany (Sorbs) (Text S1).Approval was obtained by each of the local Ethic Committees for all studies and written informed consent was given by all study participants.In total, the metaanalysis included 4,212 individuals (1,928 males, 2,284 females) of European ancestry with measured baseline A-SAA concentrations.
For validation analyses we used data of 2,136 participants of the KORA S4 sample, which were not included in the meta-analysis (Text S1).Sample sizes and characteristics of the study participants of the four genome-wide scans and the validation sample are displayed in Table S6.

Measurement of A-SAA concentrations
In all four studies, study participants were fasting and EDTA plasma samples were analyzed by immunonephelometry on a BNAII device from Siemens, Germany, and well-validated automated microparticle capture enzyme immunoassays [10,41].The inter-assay coefficients of variation were below 7% in all four studies.

Genome-wide genotyping and imputation
For genotyping different platforms as the Affymetrix 500K GeneChip array (Sorbs), Affymetrix 6.0 GeneChip array (KORA S4, LURIC, Sorbs), Illumina HumanHap300 BeadChip (317K) (TwinsUK) and Illumina Human 610K BeadChip (TwinsUK) were used.Quality control before imputation was undertaken in each study separately.Detailed information on genotyping and imputation is reported in Table S7.Imputation based on the HapMap Phase 2 CEU population was performed using IMPUTE [42] in all studies.After imputation all genotype data had to meet the following quality criteria: a minor allele frequency $0.01, a call rate per SNP $0.9, and r 2 .hatmetrics $0.40 for imputed SNPs.In total, 2,593,456 genotyped or imputed autosomal SNPs were analyzed in the meta-analysis.
For validation and comparison of genotyping platforms, we selected 27 of the most significantly associated SNPs.Genotyping of these SNPs was performed with the MassARRAY system using the iPLEX technology (Sequenom, San Diego, CA) in the KORA S4 study.The allele-dependent primer extension products were loaded onto one 384-element chip using a nanoliter pipetting system (SpectroCHIP, Spectro-POINT Spotter; Sequenom), and the samples were analyzed by matrix-assisted laser desorptionionization time-of-flight mass spectrometry (Bruker Daltonik, Leipzig, Germany).The resulting mass spectra were analyzed for peak identification via the SpectroTYPER RT 3.4 software (Sequenom).To control for reproducibility, 9.8% of samples was genotyped in duplicate with a discordance rate of less than 0.5%.

Genome-wide association analyses and meta-analysis
In each study, linear regression models for all available SNPs have been calculated on ln-transformed A-SAA levels in mg/l.The genetic effect has been assumed to be additive.Adjustment has been made for age, gender, BMI, and study specific covariates, i.e. the Friesinger Score in the LURIC population [43] and a genotyping batch variable in the TwinsUK population.Additionally, this analysis was undertaken stratified by gender.The genome-wide scans were calculated with the analysis software SNPTEST (http://www.stats.ox.ac.uk/,marchini/software/gwas/ snptest.html)(KORA S4, LURIC) QUICKTEST (http://toby.freeshell.org/software/quicktest.shtml) (Sorbs) and Merlin (http:// www.sph.umich.edu/csg/abecasis/Merlin/)(TwinsUK).
The results of all four genome-wide scans were meta-analysed using a fixed-effects model applying inverse variance weighting with the METAL software (www.sph.umich.edu/csg/abecasis/metal).Study specific results were corrected for population stratification using the genomic control method.For the overall meta-analysis, the inflation factor was 1.009.No further correction was applied.
P-values below the threshold of p = 5610 28 , which corresponds to a Bonferroni correction for the estimated number of one million tests for independent common variants in the human genome of European individuals [44], were considered to be significant.
As a measure for between study heterogeneity I 2 was calculated [45].Deviations from Hardy-Weinberg-Equilibrium were tested for all identified SNPs by means of the exact Hardy Weinberg test.For the calculation of explained variances, we subtracted the multiple R 2 value of the covariate model from those of the full model including covariates and top hits of the loci in every single study and assessed the weighted mean (KORA S4, LURIC, and the Sorbs).We tested adjacent regions for independency by analyzing the significance of their top SNPs in a joint model.

Accession numbers
The OMIM (http://www.ncbi.nlm.nih.gov/omim)accession numbers for genes mentioned in this article are 104750 for SAA1, Figure 2. Regional plots of the genetic susceptibility regions/subregions.The regional plots present gene regions and block structures of the region at 11p15.5-p13 (A), the SAA1 subregion (B), the HPS5/GTF2H1 subregion (C), the LDHA/LDHC subregion (D), and the region at 1p31 (E) and picture the probability values of the significantly associated SNPs, the colour representing the degree of correlation with the top hit of the respective region/subregion.doi:10.1371/journal.pgen.1001213.g002607521 for HPS5, 189972 for GTF2H1, 150000 for LDHA, 150150 for LDHC, 601007 for LEPR, and 606051 for SERGEF.

Figure 1 .
Figure 1.Manhattan plot and quantile-quantile plot of the results of the meta-analysis on baseline A-SAA levels.The Manhattan plot on the left hand side displays all analyzed SNPs with their calculated p-values (p-values below the threshold of genome-wide significance are coloured red).The quantile-quantile plot on the right hand side points out the observed significant associations beyond those expected by chance.doi:10.1371/journal.pgen.1001213.g001

Table 1 .
Study-specific results for the hits within the regions/subregions.