Genome-Wide Association Analysis of Soluble ICAM-1 Concentration Reveals Novel Associations at the NFKBIK, PNPLA3, RELA, and SH2B3 Loci

Soluble ICAM-1 (sICAM-1) is an endothelium-derived inflammatory marker that has been associated with diverse conditions such as myocardial infarction, diabetes, stroke, and malaria. Despite evidence for a heritable component to sICAM-1 levels, few genetic loci have been identified so far. To comprehensively address this issue, we performed a genome-wide association analysis of sICAM-1 concentration in 22,435 apparently healthy women from the Women's Genome Health Study. While our results confirm the previously reported associations at the ABO and ICAM1 loci, four novel associations were identified in the vicinity of NFKBIK (rs3136642, P = 5.4×10−9), PNPLA3 (rs738409, P = 5.8×10−9), RELA (rs1049728, P = 2.7×10−16), and SH2B3 (rs3184504, P = 2.9×10−17). Two loci, NFKBIB and RELA, are involved in NFKB signaling pathway; PNPLA3 is known for its association with fatty liver disease; and SH3B2 has been associated with a multitude of traits and disease including myocardial infarction. These associations provide insights into the genetic regulation of sICAM-1 levels and implicate these loci in the regulation of endothelial function.


Introduction
A member of the immunoglobulin superfamily of adhesion receptors, ICAM-1 is expressed on endothelial cells where it serves as a receptor for the leukocyte integrins LFA-1 and Mac-1 [1]. A soluble form of ICAM-1 (sICAM-1) is present in plasma and is thought to arise from proteolytic cleavage of the extra-cellular domains of ICAM-1. Although the physiologic function of soluble ICAM-1 remains to be fully defined, plasma concentration of sICAM-1 have a predictive value for the risk of myocardial infarction, ischemic stroke, peripheral arterial disease and noninsulin-dependent diabetes mellitus in epidemiological studies [2][3][4].
We recently described a genome-wide association study of sICAM-1 in 6,578 apparently healthy women from the Women's Genome Health Study (WGHS), which confirmed a known association at the ICAM1 locus and identified a novel association at the ABO locus [5]. These results were subsequently replicated in large-scale genomics studies from Barbalic [6] et al. and Qi [7] et al. Nevertheless, the total variance explained by these associations remained low (8.4%) as compared to the relatively high heritability estimates (from 0.34 to 0.59) [8,9] for sICAM-1. We therefore hypothesized that other, weaker, common genetic determinants of sICAM-1 remained to be discovered. To explore this issue, we performed a larger genome-wide association study (GWAS), evaluating 334,295 SNPs in 22,435 apparently healthy women of European ancestry from the WGHS.
In order to determine whether more than one non-redundant association signal could be detected at each of these five loci, we applied a model selection algorithm. The SNP with the lowest Pvalue for association was the only one retained at every locus with the exception of the ICAM1 locus, where 5 SNPs were selected by the model (Table 1). Interestingly, model selected SNPs at the ICAM1 locus showed lower P-value when they were all included in a single multivariate model than when considered separately. Three of the model selected SNPs at the ICAM1 locus (rs281437, rs1801714 and rs11575074) were not significant at a genome-wide level of significance in a univariate analysis. We performed two analyses to determine if the multiple SNPs selected at the ICAM1 locus were the result of an underlying association with a known but untyped variant. First, we tested all imputed SNPs (using MACH) within 1.5 Mb of rs1799969 (the lead SNP at that locus) for association with adjusted sICAM-1 levels. No imputed SNP was more significant than the directly genotyped rs1799969. Second, we tested the same set of imputed SNPs after additional adjustment of sICAM-1 levels for the effect of model selected SNPs. No additional SNP was associated at genome-wide significance. The 5 SNPs at the ICAM1 locus selected by our algorithm were also used in haplotype analysis using WHAP [12], as implemented in PLINK [13] ( Table 2). The estimate of the proportion of variance attributable to haplotypes, as well as their regression coefficients, is consistent with the linear model of these same SNPs, reinforcing the adequacy of an additive model to explain the association.
Next we tested whether any additional SNPs are associated with sICAM-1 levels after adjustment for the model selected SNPs (see Figure 1B). A single SNP was associated with sICAM-1 at genomewide significance (P = 5.4610 29 ; 24.1 ng/mL per minor allele) in the vicinity of the NFKBIB locus at 19q13.2 ( Figure 2D). This SNP, rs3136642, is intronic to NFKBIB and had a minor allele frequency of 0.38. The model selection algorithm retained no other SNP at the NFKBIB locus. Further adjustment of sICAM-1 values for rs3136642 did not identify any additional SNP with genome-wide significant association with sICAM-1. We also performed GWAS analysis using imputed genotypes (using MACH). Because no new locus reached genome-wide significance after adjustment for model selected SNPs, only results of directly genotyped SNPs are presented. These results were essentially unchanged when the first 10 components of a principal component analysis were included as covariates to account for sub-Caucasian stratification. All 4 novel loci identified in WGHS were replicated (one-sided P,0.05) in 9,813 individuals from the CHARGE consortium [14] (Table 3).
Collectively, the 5 SNPs at the ICAM1 gene locus explained 6.5% of sICAM1 total variance, whereas the other loci explained from 0.1 to 1.4% of the variance. In comparison, clinical covariates explained 19.5% of the variance (Table 4). For 4 of the loci, there was no strong evidence for non-additive effects of the minor allele as judged by lack of significance for a likelihood ratio test comparing the additive regression model to an alternative genotype model with an additional degree of freedom. However, the non-additive component was significant for rs507666 (P = 9.3610 26 ) at the ABO locus with a tendency toward a dominant effect (mean sICAM-1 of 362.1, 342.4 and 335.4 ng/mL for 0, 1 and 2 minor alleles, respectively). The PNPLA3 SNP rs738409 also showed evidence of non-additive association (P = 4.6610 25 ) with a tendency toward a recessive model (mean sICAM-1 of 352.8, 356.0 and 367.7 ng/mL for 0, 1 and 2 minor alleles, respectively). In spite of these non-additive

Author Summary
Soluble Intercellular Adhesion Molecule 1 (sICAM-1) is an inflammatory marker that has been associated with several common diseases such as diabetes, heart disease, stroke, and malaria. While it is known that blood concentrations of sICAM-1 are at least partially genetically determined, our current knowledge of which genes mediate this effect is limited. Taking advantage of technologies allowing us to interrogate genetic variation on a whole-genome basis, we found that variation in the NFKBIK, PNPLA3, RELA, and SH2B3 genes are important determinant of sICAM-1 blood concentrations. The NFKBIB and RELA genes are involved in regulation of inflammation. These observations are significant because this is the first report of genetic association within these extensively studied inflammation genes. The PNPLA3 gene has previously been associated with liver disease, and the SH2B3 gene has been associated with a multitude of traits including cardiovascular disease. Extension of these associations to sICAM-1 adds to the intriguing diversity of effects of these genes.
trends, no additional locus reached genome-wide significance when a genotypic test, which does not assume an additive model of association, was conducted.
Model selected SNPs were tested for association with other available inflammation markers (C-reactive protein and fibrinogen). No significant association was noted (P.0.01) after adjusting  Given the known association of sICAM-1 with cardiovascular risk and the association of selected SNPs with sICAM-1, we estimated the power to detect an association between the SH2B3 SNP rs3184504 and myocardial infarction to be 6%, for alpha = 0.05. In comparison, power varied from 5% (rs281437) to 11% (rs5498) for other SNPs. The PNPLA3 SNP rs738409 was tested for association with triglyceride, LDL cholesterol, HDL cholesterol and BMI as this gene is known to be involved in lipid metabolism and association with BMI has been previously suggested [15]. No significant association was observed.
Since smoking accounts for a large fraction of the variation in sICAM-1 levels, we tested associated SNPs for interaction with smoking. A significant interaction was observed for the ICAM1 SNP rs1799969 (interaction P = 1.6610 29 ) whereby current smokers had a stronger genetic association, as we previously reported [16]. A novel interaction was also observed with the ABO SNP rs507666, again with a stronger genetic association in current smokers (P = 0.0003). When restricting the GWAS analysis to current smokers, an additional association was observed with rs8034191 (P = 3.5610 28 ). This latter SNP is located on chromosome 15 near the nicotinic acetylcholine receptor subunit genes CHRNA3 and CHRNA5. This locus is known to be associated with smoking behavior [17,18] and rs8034191 has recently been associated with smoking quantity [19]. No novel association was observed when restricting the GWAS analysis to non-smokers after adjustment for the previously described loci.
We also tested whether multiple variants of individually weak effect could contribute to sICAM-1 levels. In cross-validation procedures, no increase in variance explained was observed when using P-value cut-offs less significant than 10 28 for inclusion of SNPs in gene scores (see Figure 3). In other words, selection of SNPs on the basis of P-value alone was not able to identify more of the genetic variance than could be explained by the SNPs with association P-value ,10 28 .

Discussion
Six loci -ABO, ICAM1, NFKBIK, PNPLA3, RELA and SH2B3have been identified in this report for association with sICAM-1. While the ABO [5] and ICAM1 [10,11] loci had been previously reported, we extended the number of non-redundantly associated variants at the ICAM1 locus by demonstrating association of rs11575074 and rs1801714 in multivariate analysis along with the known rs1799969, rs5498 and rs281437 SNPs [5]. Neither rs1801714 nor rs11575074 are predicted eQTL (http://eqtl. uchicago.edu/Home.html), but rs1801714 is a missense variant (P352L) and rs11575074 is located in a predicted binding site for several transcription factors including PPARG [20]. The NFKBIK, PNPLA3, RELA and SH2B3 associations are novel. No strong contribution of weakly associated variants was observed in the polygene analysis whereby SNPs of varying statistical significance were included in gene scores.
Nuclear factor kB (NF-kB) proteins are a family of transcription factors involved in a number of physiological processes that include cell survival, proliferation, and activation. The NF-kB proteins (NFKB1 or NFKB2) are bound to REL, RELA, or RELB to form the NF-kB complex. These complexes are typically localized in the cytoplasm, where they are trapped by binding to  IkB inhibitory proteins NFKBIA or NFKBIB. Upon inflammatory simulation, IkB kinase A and B phosphorylate IkB inhibitory proteins and mark them for degradation via the ubiquitination pathway, thereby allowing activation of the NF-kappa-B complex. Activated NF-kB complexes translocate into the nucleus and bind to NF-kB DNA binding motifs. NF-kB triggers transcription of various genes critical to inflammation, such as cytokines, chemokines and cell adhesion molecules including ICAM1 [21,22]. Remarkably, two of the novel associations involve genes physically interacting with NF-kB. No genetic interaction, however, was noted between these two SNPs (data not shown). Taken together, these results emphasize the importance of the NFKB pathway in the regulation of sICAM-1 levels.
PNPLA3 encodes a protein of unknown function that belongs to the patatin-like phospholipase family. Members of that family are believed to complement hormone sensitive lipase for adipocyte triacylglycerol lipase activity. The methionine allele of the missense PNPLA3 SNP rs738409 (Ile148Met) has recently been associated with increased hepatic fat levels, hepatic inflammation and plasma levels of liver enzymes (traits linked to insulin resistance and obesity) [23,24]. Nevertheless, rs738409 has been shown not to be associated with insulin resistance [25] although a previous study demonstrated an association with insulin secretion in response to oral glucose tolerance test [15]. Levels of the inflammatory marker sICAM-1 are known to be correlated with insulin resistance and obesity [4]. Consistent with rs738409 modulating the response to insulin resistance and associated phenotypes, the risk allele for fatty liver disease was associated with increased sICAM-1 levels.
SH2B3 encodes Lnk, an adaptor protein that mediates the interaction between extra-cellular receptors, such as the T-cell receptor and the thrombopoietin receptor MPL, and intracellular signaling pathways. Cells from Lnk-deficient mice show an increased sensitivity to several cytokines and altered activation of the RAS/MAPK pathway in response to IL3 and stem cell factor [26]. The same SH2B3 SNP rs3184504 identified in our study has previously been associated with multiple other traits, including blood pressure [27,28], blood eosinophil number [29], myocardial infarction [29], celiac disease [30], type I diabetes [31], LDLcholesterol [32], asthma [29], blood platelet number [33], hemoglobin concentration [34] and hematocrit [34]. Furthermore, rs3184504 is a non-synonymous SNP (Arg262Trp) whose derived allele (Trp) is part of a haplotype that has been suggested to have been introduced 3,400 years ago and selectively swept in European populations [33]. The derived allele is the risk allele for coronary artery disease and was the allele associated with higher sICAM-1 concentration. Association of rs3184504 with sICAM-1 further demonstrates the remarkable pleiotropy of that genetic variant by extending its effect to endothelial cell adhesion molecules. An interesting hypothesis is whether changes in sICAM-1 are mediated through increased sub-clinical atherosclerosis, but further studies will be needed to address this question.
In this report, we demonstrate genetic association of sICAM-1 with the ABO, ICAM1, NFKBIK, PNPLA3, RELA and SH2B3 loci. These findings broaden our current knowledge of the genetic architecture of sICAM-1 with identification of four novel loci. The novel association at PNPLA3 reinforces the importance of insulin resistance-related processes in the regulation of sICAM-1 levels. The observed associations also provide evidence of functional genetic variation at two genes -NFKBIK and RELA -well known for their implication in the NF-kB pathway, therefore providing a basis for the study of these polymorphisms in other conditions where this same pathway is involved. The results also extend the effect of the SH2B3 SNP rs3184504 to endothelial function.

Ethics Statement
All analyses were performed with approval of the institutional review board of the Brigham and Women's Hospital. All members of the WGHS cohort were participants in the WHS who provided an  adequate baseline blood sample for plasma and DNA analysis and who gave consent for blood-based analyses and long-term follow-up.

Study Sample and sICAM-1 Measurements
All participants in this study were part of the Women's Genome Health Study (WGHS) [35]. Briefly, participants in the WGHS include North American women from the Women's Health Study (WHS) with no prior history of cardiovascular disease, diabetes, cancer, or other major chronic illness who also provided a baseline blood sample at the time of study enrollment. For all WGHS participants, EDTA anticoagulated plasma samples were collected at baseline and stored in vapor phase liquid nitrogen (2170uC). Circulating plasma sICAM-1 concentrations were determined using a commercial ELISA assay (R&D Systems, Minneapolis, Minn.); the assay used is known not to recognize the K56M (rs5491) variant of ICAM-1 [36] and the 82 Caucasian carriers of this mutation were therefore excluded from further analysis. The intra-assay coefficient of variation was 6.7% and the reported intra-individual coefficient of variation 7.6% [37]. This study has been approved by the institutional review board of the Brigham and Women's Hospital. Additional clinical characteristics of this sample are provided in Table S2.

Genotyping
Samples were genotyped with the Infinium II technology from Illumina. Either the HumanHap300 Duo-Plus chip or the combination of the HumanHap300 Duo and I-Select chips was used. In either case, the custom content was identical and consisted of candidate SNPs chosen without regard to allele frequency to increase coverage of genetic variation with impact on biological function including metabolism, inflammation or cardiovascular diseases. Genotyping at 318,237 HumanHap300 Duo SNPs and 45,571 custom content SNPs was attempted, for a total of 363,808 SNPs. Genetic context for all annotations are derived from human genome build 36.1 and dbSNP build 126.
SNPs with call rates ,90% were excluded from further analysis. Likewise, all samples with percentage of missing genotypes higher than 2% were removed. Among retained samples, SNPs were further evaluated for deviation from Hardy-Weinberg equilibrium using an exact method [38] and were excluded when the P-value was lower than 10 26 . Samples were further validated by comparison of genotypes at 44 SNPs that had been previously ascertained using alternative technologies. SNPs with minor allele frequency .1% in Caucasians were used for analysis. After quality control, 334,295 SNPs were left for analysis.

Population Stratification
Because population stratification can result in inflated type I error in a GWAS, a principal component analysis using 1443 ancestry informative SNPs was performed using PLINK [13] to confirm self-reported ancestry. Briefly, these SNPs were chosen based on Fst .0.4 in HapMap populations (YRB, CEU, CHB+JPT) and inter-SNP distance at least 500 kb in order to minimize linkage disequilibrium. Different ethnic groups were clearly distinguished with the two first components. 31 selfidentified Caucasian women were removed from analysis because they did not cluster with other Caucasians, leaving 22,435 nondiabetic participants with non-missing sICAM-1 information for analysis. To rule out the possibility that residual stratification within Caucasians was responsible for the associations observed, a principal component analysis [39] was performed in Caucasians (only) using 64,205 SNPs chosen to have pair-wise linkage disequilibrium lower than r 2 = 0.2. The first ten components were then used as covariates in the association analysis. As adjustment by these covariates did not change the conclusions, we present analysis among Caucasian participants without further correction for sub-Caucasian ancestry unless stated otherwise.

Association Analysis
Plasma concentrations of sICAM-1 were adjusted for age, smoking, menopause and body mass index using a linear regression model in R to reduce the impact of clinical covariates on sICAM-1 variance. The adjusted sICAM-1 values were then tested for association with SNP genotypes by linear regression in PLINK [13], assuming an additive contribution of each minor allele. A conservative P-value cut-off of 5610 28 was used to correct for the roughly 1,000,000 independent statistical tests thought to correspond to all the common genetic variation of the human genome [40,41].

Model Selection Algorithm
To investigate whether more than one SNP in each locus is independently associated with sICAM-1, a forward selection multiple linear regression model was used. For each locus with at least one genome-wide significant SNP (i.e. P,5610 28 ), all genotyped SNPs within 1.5 Mb of the most significantly associated SNP and passing quality control requirements were selected for potential inclusion in our model. The forward selection algorithm then proceeded in two steps. In the first step, all SNPs not yet included in the multiple regression model were tested for association with sICAM-1. In step two, the SNP with the smallest P-value was included in the model if its multiple regression P-value was less than 5610 28 . We then repeated steps one and two, such that a single SNP was added to the multiple regression model at each iteration. The algorithm was stopped when no more SNP passed the P,5610 28 requirement.

Polygene Analysis
To test whether multiple genetic variants of individually weak effect could explain a substantial fraction of sICAM-1 variance, we performed a ''polygene'' experiment as previously described [42]. Briefly, we randomly divided our dataset in 5 equal parts. We then tested SNPs for association with sICAM-1 using 4 out the 5 parts and performed linkage disequilibrium pruning as implemented in PLINK (r 2 .0.05 and distance ,1 Mb). We then derived a gene score with non-redundant associated SNPs using varying P-value thresholds and weighting each SNP for its beta coefficient. Finally, we tested the gene score for association with sICAM-1 in the remaining one fifth of the total sample and calculated the adjusted R 2 . This experiment was repeated 5 times using each one of the five parts as the gene score validation group alternatively.

Replication of Novel Associations in CHARGE
We sought to replicate the 4 novel loci identified in 9,813 individuals from the Cohorts for Heart and Aging Research in Genome Epidemiology (CHARGE) consortium [14] for whom plasma sICAM-1 concentration and genotypes were available. The CHARGE sample consists of 4 meta-analyzed cohorts: the Framingham Heart Study, the Cardiovascular Health Study, the Atherosclerosis Risk in Communities study, and the Rotterdam Study. Complete information on each study is available as Text S1. Association analyses were performed on imputed genotypes using an additive genetic model on age and sex adjusted logtransformed sICAM-1 values.