Computational Characterization of Osteoporosis Associated SNPs and Genes Identified by Genome-Wide Association Studies

Objectives Genome-wide association studies (GWASs) have revealed many SNPs and genes associated with osteoporosis. However, influence of these SNPs and genes on the predisposition to osteoporosis is not fully understood. We aimed to identify osteoporosis GWASs-associated SNPs potentially influencing the binding affinity of transcription factors and miRNAs, and reveal enrichment signaling pathway and “hub” genes of osteoporosis GWAS-associated genes. Methods We conducted multiple computational analyses to explore function and mechanisms of osteoporosis GWAS-associated SNPs and genes, including SNP conservation analysis and functional annotation (influence of SNPs on transcription factors and miRNA binding), gene ontology analysis, pathway analysis and protein-protein interaction analysis. Results Our results suggested that a number of SNPs potentially influence the binding affinity of transcription factors (NFATC2, MEF2C, SOX9, RUNX2, ESR2, FOXA1 and STAT3) and miRNAs. Osteoporosis GWASs-associated genes showed enrichment of Wnt signaling pathway, basal cell carcinoma and Hedgehog signaling pathway. Highly interconnected “hub” genes revealed by interaction network analysis are RUNX2, SP7, TNFRSF11B, LRP5, DKK1, ESR1 and SOST. Conclusions Our results provided the targets for further experimental assessment and further insight on osteoporosis pathophysiology.


Introduction
Osteoporosis is a complex, polygenic disease that is characterized by reduced bone mass and microarchitecture deterioration of bone tissues, thereby leading to loss of strength and increased risk of fractures [1]. Osteoporosis and its main complication and fragility fractures incur substantial global morbidity and mortality [2]. The causes of osteoporosis and osteoporotic fractures are genetic background, environmental factors and their interactions [3]. Osteoporosis is defined clinically through the measurement of bone mineral density (BMD). BMD is highly heritable with the heritability estimates of 0.5-0.9 [3]. By contrast, heritability estimates of fractures are relatively low [3].
With the completion of the International HapMap Project [4] and 1000 Genomes Project [5], genome-wide association study (GWAS), a test for association between hundreds of thousands of single-nucleotide polymorphisms (SNPs) and a specific phenotype, becomes a major approach to discover genes and SNPs that contribute to complex diseases, and has been a resounding success [6]. By the end of 2014, nine GWAS [7][8] and nine meta-analyses [9][10] reported 107 genes and 129 SNPs (Lead SNP) that were associated with BMD, osteoporosis or fractures with a significance threshold of 5×10 −8 . Functional characterization and mechanistic elucidation of these SNPs and genes are next major challenge [11].
Tools of computational biology are powerful for post-GWAS studies, and could identify the potential and promising causal SNPs that deserve experimental test for follow-up functional studies. Extensive work has been done in this area in recent years [12,13]. Various computational methods and tools have been successfully developed and are available through the World Wide Web. Their performances were well validated through identifying numerous disease-associated SNPs for further study and revealing previously unknown mechanisms for complex diseases. For example, Zhang et al. [14] reported an immune-and microglia-specific module that is dominated by genes involved in pathogen phagocytosis, contains TYROBP as a key regulator, and is upregulated in late-onset Alzheimer's disease through an integrative network-based approach. Integrative computational analysis of cross-species conservation with an assessment of co-occurring transcription factor (TF) binding sites suggested that the T allele of rs4684847 can reduce the binding ability of the transcriptional regulator PRRX1 and maintains a higher level of PPARG2 expression, while the risk C allele enhances binding of the PRRX1, and thus inhibits PPARG2 mRNA expression, thereby provoking dysregulation of free fatty acids turnover and glucose homeostasis [15]. Computational analyses currently serve as an useful starting point to guide the design of functional assays [12].
Recent studies demonstrated that SNPs could contribute to osteoporosis susceptibility by influencing TF or miRNA binding. The rs4317882 of MPP7 is associated with BMD with the genome-wide significance. EMSA results showed the binding of TF GATA2 to the risk allele 'A' but not the 'G' allele of rs4317882 [16]. Several SNPs of POSTN were associated with BMD or vertebral fractures. A specific binding of CDX1 to the sequence with the major allele A but not the variant G allele of the POSTN rs9547970 was confirmed also by EMSA [17]. Three SNPs (rs6854081, rs1048201 and rs7683093) in miRNA target sites of the FGF2 gene were significantly associated with femoral neck BMD [18]. The rs17737058 significantly associated with the decreased BMD is disruptive to the miR-488-5p:NCOA1 interaction [19]. In this study, we applied multiple computational approaches to comprehensively and systematically analyze osteoporosis-associated SNPs and genes identified by GWAS, including gene ontology (GO) and pathway analysis, protein-protein interaction analysis, SNP conservation analysis and functional annotation (effects of SNPs on TF binding and miRNA binding). We aimed to identify potential SNPs for follow-up functional analyses, and to offer guidance for future research with regard to the etiology and pathogenesis of osteoporosis.

Search strategy and data collection
We searched osteoporosis GWAS articles using the following keywords: "Osteoporosis"/ "BMD"/"fracture" and "GWAS", and extracted osteoporosis GWAS associated SNPs with P<5×10 −8 and genes (updated to May 1, 2015). We also searched the linked SNPs which were in linkage disequilibrium (LD) (D'>0.8) with osteoporosis GWAS associated lead SNPs using LD information in the Caucasians population via HapMap website. A detailed workflow chart of our methodology is illustrated in Fig 1.

Conservation analysis
Sequence conservation is a signal of functional constraint, and can be used for function prediction. Comparative genomics is an evolutionary approach to detect functional elements [20]. The VISTA (http://pipeline.lbl.gov/cgi-bin/gateway2) and UCSC Genome Browser (http:// genome.ucsc.edu) are two web-based tools that can be used to align multiple sequences from several vertebrate species, including humans (hg19: GRCh37). The position of SNP was submitted to the VISTA detabase [21], and Chimp (panTro4: CSAC 2.1.4) and Mouse (mm10: GRCm38) genomes were selected. The p-values were obtained using the rankVISTA (http:// genome.lbl.gov/vista/rankVISTA.shtml) tool. The UCSC Genome Browser [22] was used to compute conservative frequency for individual base of SNPs. Table 1 showed the number of species with the conserved major human allele out of nine default organisms (human, rhesus, mouse, dog, elephant, chicken, x. tropicalis, zebrafish and lamprey) from the vertebrate Multiz alignment of 100 species [23].

Influence of SNPs on TF binding
Interactions between TFs and their DNA binding sites (TFBS) play a key role in gene transcription. SNPs that may locate within the promoter or distant enhancer region of gene can alter (modify, destroy or create) the binding of TFs with DNA, and subsequently regulate gene expression. The JASPAR database (http://jaspar.cgb.ki.se) [24] was utilized to predict the candidate SNPs which may change the binding affinity. The JASPAR CORE database contains a curated and non-redundant set of profiles, derived from published collections of experimentally defined TFBSs for eukaryotes. The 51 bp DNA sequence surrounding the SNP was submitted to the JASPAR CORE (2014) database to analyze the functionally potential SNPs that may disrupt the binding of TFs or change the binding affinity (only human matrix models were selected). The score of a sequence suggest its potential as putative TFBS. A string threshold of 5:0 was used to define loss or gain for a SNP to filter false positive results.  miRNASNP2/online.php) provides a resource of the miRNA-related SNPs, which included SNPs in human pre-miRNAs and miRNA flanks, SNPs in other species' miRNAs, and target gain or loss by SNPs in miRNA seed regions or 3'UTR of target mRNAs. We used miRNASNP to predict the influence of SNPs on the miRNA binding.

GO and pathway analysis
Gene Ontology (GO, http://www.geneontology.org) is a widely adopted source of gene functional annotation including biological process, molecular function and cellular component. Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/pathway. html) is a database resource for understanding high-level functions and utilities of the biological system. We used Search Tool for the Retrieval of Interacting Genes (STRING, version 9.1, http://string-db.org/) to investigate if any common functional trends are associated with the osteoporosis GWAS-associated genes (P<5x10 -8 ). The hypergeometric statistical test was used to determine if a pathway from GO or KEGG is significantly enriched. P value for each pathway was adjusted with Benjamini & Hochberg method to correct for multiple testing [25]. GO and KEGG pathways with an adjusted P<0.01 were considered to be significant [26]. In addition, PANTHER (Protein Analysis Through Evolutionary Relationships) classification system, which can provide intuitive visualizations of images of GO analysis, was used to classify proteins (and their genes) in order to facilitate high-throughput analysis.

Protein-protein interaction analysis
STRING is a database of known and predicted protein interactions, including direct (physical) and indirect (functional) associations. STRING quantitatively integrates interaction data derived from four sources: Genomic Context, High-Throughput Experiments, Co-expression (Conserved) and Previous Knowledge [27]. The database currently covers 5,214,234 proteins from 1133 organisms. STRING was used to generate risk gene (protein) network.

Functional annotation of SNPs using GWAS3D
Interpreting disease associated non-coding SNPs is an indispensable step to understand molecular mechanism of complex diseases. GWAS3D (http://jjwanglab.org/gwas3d) systematically computes the probability of GWAS associated SNPs affecting regulatory pathways and underlying disease associations by integrating chromatin state, functional genomics, sequence motif and conservation information [13]. It also provides comprehensive annotations and visualizations to help users interpret the results. In this study, we used GWAS3D to identify the most significant SNPs which has a long-range interaction signal with other locus with the following parameters: r 2 = 0.8, p-value cutoff 1×10 −5 , HapMap I+II+III, CEU population, binding site pvalue 0.01, promoter range from 100 to 500bp, 30 variant size and 3 interaction size.

Osteoporosis GWAS-associated gene/SNPs
According to NCBI association results, a total of 129 SNPs and 107 genes were associated with BMD, osteoporosis or fractures with a significance threshold of 5×10 −8 . Of these SNPs, 72 mapped to introns, 35 to intergenic regions, 6 to exons, and 3 in 3' UTR. We identified 222 SNPs linked with osteoporosis GWAS-associated lead SNPs using LD information in the Caucasians population via HapMap website. The detail of osteoporosis GWAS-associated lead SNPs and genes was shown in S1 Table.

Conservation analysis of osteoporosis-associated SNPs
To investigate the evolutionary conservation of osteoporosis GWAS-associated SNPs, we performed a comparative genome analysis using VISTA. Fourteen SNPs are within conserved sequence in the pairwise alignments of Mouse (P<0.05), whereas the number of conservative SNPs increases to 20 with Chimp. We estimated the evolutionary conservation rate of SNPs using UCSC Genome Brower. Twenty-six SNPs might locate in evolutionally conservative regions with the conservative frequency of single base >4/9. Nine SNPs are significantly conservative in both methods ( Table 1).

Influence of SNPs on TF binding
Effects of osteoporosis GWAS-associated lead SNPs and their linked SNPs on the TF binding affinity were analyzed by JASPAR. Top potentially affected TFs were SRY, SPIB, JUN::FOS, THAP1 and NFATC2. Table 2 showed the results (scores >5) of TFs NFATC2 (31), MEF2C

Influence of osteoporosis GWAS-associated SNPs on miRNA binding
Using miRNASNP v2.0, we have predicted 2 GWAS-revealed lead SNPs (rs884205 and rs1026364) that may influence the recognition and targeting of miRNA. The rs884205 C and rs1026364 T alleles create the binding site for miR-3658 and miR-345-5p, respectively. The number of candidate SNPs increased to 11 when predicted linked SNP (222) ( Table 3).
Long-range interaction of osteoporosis-associated SNPs using GWAS3D SNPs rs1463104, rs4729260 and rs2887571 were detected as significant SNPs by GWAS3D based on the CEU population and all cell types listed in GWAS3D (Fig 4). SNP rs1463104, located on the downstream of MEPE intronic region of DECR1 in chromosome 4, has two long-range interaction signal with locus 13q31.1 and locus near HSP90AB3P. SNP rs4729260, located on the intronic region of C7orf76 in chromosome 7, has two long-range interaction signals with locus Xq12 and 1q43. SNP rs2887571, located on the intergenic region of ERC1 and WNT5B in chromosome 12p13.33, showed three long-range interaction signals with locus 12p13.13, locus near IQSEC3 and MON2, respectively.

Discussion
To explore function and mechanisms of osteoporosis GWAS-associated SNPs and genes, we characterized SNPs conservation and their influence on TFs and miRNA binding, and conducted GO and pathway analyses, protein-protein interaction analysis of osteoporosis GWASassociated genes. Our analyses revealed nine significantly conservative SNPs. However, a number of SNPs potentially altering the binding affinity of TFs and miRNAs were not located within conservative regions, mirroring the controversy about relationship between conservation and functionality [17].
Our results indicated that SNPs may change the binding affinity of TFs NFATC2, MEF2C, SOX9, RUNX2, ESR2, FOXA1 and STAT3. RUNX2 and MEF2C are important TFs in bone metabolism. Multiple genes regulated by RUNX2 and MEF2C are involved in osteoblast differentiation and bone metabolism. MEF2C can bind to the ECR5 enhancer upstream of SOST, and regulate SOST expression [28]. It is well-established that Nfat plays a role in osteoclastogenesis. NFAT and Osterix cooperatively control osteoblastic bone formation [29]. Overexpression of a constitutively active form of Nfatc2 suppresses osteoblastic gene expression in vitro [30]. Nfatc2 activation in osteoblasts inhibits bone formation and causes cancellous bone osteopenia [31]. SOX9 is a master regulator TF in cartilage, and is essential for cartilage development in mice [32]. STAT is an important member component of Jak-STAT signaling pathway which plays an important role in bone development and metabolism. Among all the STATs, STAT3 is probably the most important TF mediating intracellular signaling in osteoblasts and osteoclasts. Selective inactivation of STAT3 in osteoblasts causes smaller body mass and lower bone mass due to inhibition of bone formation [33]. Clinical studies indicate that STAT3 mutation increases osteoclast number and bone resorption [34,35]. Estrogen is a key hormonal regulator of bone metabolism [36]. The biological actions of estrogen are primarily mediated through estrogen receptor. Recent chromatin immunoprecipitation coupled with microarray studies postulate that the binding of ESR at nearly 50% of target sites on chromosomes is modulated by FOXA1 [37]. Interestingly, we found that SNPs influence both FOXA1 and ESR binding in multiple genes, providing a mechanistic link between risk SNPs and osteoporosis.
There are several successful examples of identification of the SNPs that alter miRNA binding and drive complex diseases [11]. Studies have indicated that miRNAs play a critical role in bone formation [38]. Our results indicated that several SNPs potentially influenced miRNA binding to target mRNA, providing important clues and targets for further experimental test.
Notably, our pathway enrichment analyses of osteoporosis GWAS-associated genes only confirmed well-known Wnt and Hedgehog signaling pathways, and failed to provide support for many other mechanisms previously promoted on the basis of physiological evidence as likely contributors to osteoporosis pathogenesis, presumably due to poor representation of the processes critical to osteoporosis development in existing pathway databases. Using STRING, we identified highly-interconnected network "hub genes (proteins)", RUNX2, SP7, TNFRSF11B, LRP5, ESR1, DKK1, and SOST. LRP5, SOST and DKK1 are important members in Wnt signaling pathway. SP7 (Osterix) and RUNX2 are two specific TFs of osteoblast, and the central regulatory factors in osteoblast differentiation and function. RUNX2 binds to the promoters of a variety of genes (including osteocalcin, osteopontin, bone sialoprotein and type I collagen) related to osteoblast differentiation or function [39]. The expression of many of the bone related genes still require SP7, regardless of the presence of RUNX2 [40]. TNFRSF11B (OPG) is a member of the TNF-receptor superfamily and an osteoblast-secreted decoy receptor that functions as a negative regulator of bone resorption. OPG expression promotes osteoclast development and enhances its function under the condition of estrogen deficiency [41]. ESR1 are hormone regulators of bone metabolism. More importantly, network analyses also highlighted notable biological connections between osteoporosis GWAS-associated gene sets. For example, SOST is a negative regulator of bone formation and regulated by PTH and estrogen. Our network analyses showed that SOST is a direct target of RUNX2 and SP7.
In conclusion, computational characterization of osteoporosis GWAS-associated genes highlighted genes (proteins) and pathways that are crucial for osteoporosis pathophysiology, thereby improving our understanding of the condition and providing potential treatment targets. Our results also unraveled the potential functional mechanisms of osteoporosis GWASassociated SNPs through influencing TF and miRNA binding that warrant further experimental test in the future.
Supporting Information S1 Table. Information of osteoporosis GWAS-associated SNPs and genes. (XLS)