Common Variants in a Novel Gene, FONG on Chromosome 2q33.1 Confer Risk of Osteoporosis in Japanese

Osteoporosis is a common disease characterized by low bone mass, decreased bone quality and increased predisposition to fracture. Genetic factors have been implicated in its etiology; however, the specific genes related to susceptibility to osteoporosis are not entirely known. To detect susceptibility genes for osteoporosis, we conducted a genome-wide association study in Japanese using ∼270,000 SNPs in 1,747 subjects (190 cases and 1,557 controls) followed by multiple levels of replication of the association using a total of ∼5,000 subjects (2,092 cases and 3,114 controls). Through these staged association studies followed by resequencing and linkage disequilibrium mapping, we identified a single nucleotide polymorphism (SNP), rs7605378 associated with osteoporosis. (combined P = 1.51×10−8, odds ratio = 1.25). This SNP is in a previously unknown gene on chromosome 2q33.1, FONG. FONG is predicted to encode a 147 amino-acid protein with a formiminotransferase domain in its N-terminal (FTCD_N domain) and is ubiquitously expressed in various tissues including bone. Our findings would give a new insight into osteoporosis etiology and pathogenesis.


Introduction
Osteoporosis (MIM166710) is one of the most common skeletal diseases affecting more than 200 million individuals in the world. Its prevalence is estimated to be increasing dramatically as population ages [1]. Osteoporosis is characterized clinically by reduced bone mass and compromised bone strength, leading to an increased risk of fracture.
Osteoporosis is a polygenic disease; Both environmental and genetic factors contribute to its etiology and pathogenesis [2]. To understand its genetic factor, identification of its susceptibility gene(s) is important. There are several experimental approaches to identify susceptibility genes for osteoporosis. One is a candidate gene approach. Genes relevant to bone metabolism and disease genes of rare monogenic bone diseases are widely studied by this approach and the association with osteoporosis has been reported in many genes; however, only a few genes like those for estrogen receptor 1 (ESR1), a1 chain of type I collagen (COL1A1) and low-density lipoprotein 5 (LRP5) are replicated for their association [3][4][5], including large-scale meta-analyses using different ethnic populations [6,7].
Another approach is a genome-wide association study (GWAS). GWAS has a great power to detect genetic variants with less than moderate effects [8,9]. Its notable advantage is a potential for finding previously unknown susceptibility genes [10]. Recently, several groups conducted GWAS and identified many loci associated with susceptibility to osteoporosis mainly in Caucasian [11][12][13][14][15][16]; however, the genetic contribution to osteoporosis is not entirely known.
To uncover additional susceptibility gene(s) for osteoporosis, we conducted a GWAS in Japanese followed by staged replication studies. We found a SNP (rs7605378) on chromosome 2q33.1 that showed significant association (P = 1.51610 28 ) with susceptibility to osteoporosis. The SNP is in a previously unknown gene, which we named FONG.

GWAS
To identify the causal SNPs associated with osteoporosis, we used staged association method [17,18] (Fig. S1). As the first stage of discovery (Discovery 1), we performed GWAS and genotyped 268,064 SNPs that covered 56% of common SNPs in Japanese, in 190 cases and in 1,557 controls registered in the BioBank Japan (BBJ) [19]. After passing through the quality control (QC) filter described in the Material and Method, we successfully obtained genotyping data for 224,507 SNPs. The x 2 distributions for the association tests across the tested SNPs showed a low possibility of overall systematic bias (genomic inflation factor: l GC = 1.02). We further performed a principal component analysis (PCA) [20] for the samples and found no evidence for population stratification (Fig. 1A).
Step-wise screening As the second stage of discovery (Discovery 2), we selected 3,000 SNPs showing the smallest P values in Discovery 1 and genotyped these SNPs in an independent set of subjects composed by 526 cases and 1,537 controls. We successfully obtained genotyping data for 1,654 SNPs. Quantile-quantile plots revealed the presence of a number of SNPs associated with osteoporosis (Fig. 1B). The x 2 distributions for the association tests across the tested SNPs showed a low possibility of false positive association due to population stratification (l GC = 1.04).
After the Discovery stages, no SNP exceeded the genome-wide significance threshold. We therefore selected three SNPs that showed the smallest P values (P,1.0610 23 in Discovery 2) for the replication. In the discovery stages, there were age and sex differences between the cases and controls. Therefore, to exclude the false positive due to the differences, we used age-and sexadjusted cases and controls in the replication stages (Table S2). As the first stage of replication (Replication 1), we genotyped the SNPs in an independent set of female subjects composed by 1,326 cases and 1,292 controls. We set significance threshold in this stage after the Bonferroni correction for multiple testing to P,1.67610 22 ( = 0.05/3). Only one SNP, rs7605378 on chromosome 2q33.1 showed significance (P = 2.99610 23 ) ( Table 1). To validate the association of rs7605378, we further genotyped it in an independent female population of 240 cases and 285 controls as the second stage of replication (Replication 2), and found further replication of the significant association (P = 3.97610 22 ) (Table 1).
Thus, through the staged association study using independent populations, we identified and validated the association of rs7605378, a new susceptibility loci for osteoporosis. The combined P value was 1.51610 28

LD mapping
To define the linkage disequilibrium (LD) block containing rs7605378, we examine SNPs around rs7605378 (Fig. 2). We referenced the International HapMap Project database (release 23a) and selected SNPs that had D' value of .0.8 to rs7605378 and a minor allele frequency of .0.1. The LD block around rs7605378 contained 51 HapMap SNPs and one hypothetical gene, LOC348751. Next, we selected tag SNPs including rs7605378 that covered all 51 SNPs with an r 2 value of .0.8. After genotyping the 14 tag SNPs for 2,042 cases (Discovery 1, 2 and Replication 1) and 1,292 controls (Replication 1), we found no more significantly associated SNP than rs7605378 (Table 2). Then, we analyzed haplotype association using the 14 tag SNPs for the LD block. We did not find any haplotypes that had more significant association than rs7605378 ( Table 3).

Identification of FONG
In the NCBI genome database (build 36.3), rs7605378 lay within a hypothetical gene, LOC348751. Because the LOC348751 transcript was based on in silico predictions and expressed sequence  [20], and plotted for the first (X axis) and the second (Y axis) principal components (PCs), respectively. Our case and control samples are plotted in a single cluster of Japanese. (B) Quantile-quantile (Q-Q) plots of allelic association using Fisher's exact (allelic) test in Discovery 2. Under the null hypothesis of no association at any locus, the points would be expected to follow the slope line (light green). Deviations of the points (red dots) from the line correspond to loci that deviate from the null hypothesis. The genetic inflation factor lambda is 1.04. doi:10.1371/journal.pone.0019641.g001  tags (ESTs) only, we tried to clone a full sequence of the actually expressed transcript from bone by RACE and RT-PCR. We identified a new transcript that overlapped with, but was different from LOC348751. The longest transcript was 1,997 bp in length with a predicted protein of 147 amino acids (Fig. 3). A protein motif analysis program (http://www.ebi.ac.uk/Tools/InterProScan/) predicted that this protein contained a signal peptide and a formiminotransferase domain in its N-terminal (FTCD_N domain). We named this newly identified gene FONG (for formiminotransferase N-terminal sub-domain containing gene: AB568489). FONG had many alternative splicing variants and multiple transcription start sites (TSSs). We performed luciferase assays in two human osteoblastic cell lines, MG-63 and SaOS-2, and confirmed that the region containing 500-bp upstream of the major TSS and 59UTR possessed promoter activity (data not shown).

Expression of FONG
To confirm the expression and size of the FONG transcript, we carried out Northern analysis and identified two transcripts, approximately 2.2 kb and 2.0 kb in length. The 2.0-kb transcript was common to all tissues examined (Fig. 4A). We also examined FONG expression in various human tissues using real-time PCR. FONG was ubiquitously expressed in various tissues including bone (Fig. 4B).

Determination of the most associated SNP
To locate the functional, osteoporosis-associated SNP, we searched for SNPs in ,25 kb region that had r 2 .0.8 with rs7605378 by direct sequencing of genomic DNA from 24 case subjects. We found 20 previously unknown SNPs in addition to 39 known SNPs in the HapMap database (Table S3). After calculating pairwise r 2 values for all 59 SNPs in this region, we selected 22 tag SNPs with r 2 .0.95. In addition to the six SNPs already genotyped, we selected 16 additional SNPs and genotyped them for 697 cases (a part of Replication 1) and 567 controls (a part of Replication 1). Two SNPs (SNP11 and rs58319901) showed more significant association than rs7605378 (Table S4). We genotyped them in additional samples consisting of 2,042 cases (Discovery 1, 2 and Replication 1) and 1,292 controls (Replication 1); however, both SNP did not show more significant association than rs7605378 (Table S5). In the ,25 kb region that had r 2 .0.8 with rs7605378, we found 12 SNPs were in perfect LD (r 2 = 1) with rs7605378 (Table S3). Because they were all in perfect LD with each other, we could not pinpoint a candidate causal SNP among these 13 SNPs.

Discussion
We have used a staged association design that provides multiple levels of replication followed by resequencing of the LD block, and identified a SNP, rs7605378 that is associated with susceptibility to osteoporosis. Only a few osteoporosis GWAS have been reported in Asian, which have identified a few specific genes like JAG1 and ALDH7A1 [16,24]. This study represents the first GWAS of osteoporosis in Japanese. In the discovery stages (Discovery 1 and 2), there were age and sex differences between cases and controls. However, the allele frequency of rs7605378 in the stages was not significantly different between males and females (P = 0.33). We re-evaluated the association of rs7605378 in the stages by adjusted sex (female-only analysis). Because of the decreased number of samples, the combined P value of rs7605378 in the discovery stages became a little high, but their ORs are similar after adjustment (with males: 1.32; female only: 1.34). In addition, we adjusted the age of the female samples by a logistic regression and found no significant change in the ORs. Therefore, we considered age and sex difference between cases and controls did not confound the association. rs7605378 exceeds definite genome-wide significance level even after Bonferroni correction, which is known to be very conservative.  The SNP is in a previous unknown gene, FONG. To our knowledge, this is the first report of a novel gene as a susceptibility gene of osteoporosis. Comparison of our GWAS data with SNPs identified in previous GWASs on osteoporosis [6,12,[14][15][16][21][22][23][24] showed five SNPs in three genes (PLCL1, DOK6, and MEF2C) with P values below 0.05 (Table S1). These results in Japanese, a different ethnic population from the previous studies would support their association. We could not deny the association of other promising SNPs identified in previous GWASs, because our GWAS has a limited power to detect association due to the relatively small sample size and moderate coverage of the genome. Ethnic difference in genetic background may also preclude replication of these SNPs; some SNPs identified in previous GWAS in Caucasians are found monomorphic in Japanese in the present study. Three SNPs in PLCL1 showed significant association even after correction of the multiple testing in our study (Table S1). However, whether these results really support the association of PLCL1 is not clear, because their allele frequencies had not been disclosed in the previous study [12] and hence the direction of the association of these SNPs (i.e., which alleles were the susceptibility alleles) remains unknown. In the current public databases, rs7605378 was located in LOC348751, a hypothetical gene based on in silico prediction. In the prediction, LOC348751 consists of 5 exons; however, our RT-PCR experiments could only prove a part of exon 2 and exons 3-5. The predicted exon 1 was not present. Therefore, we performed the 59and 39-RACE using bone cDNA and determined the actual mRNA sequence and the gene structure. We found that FONG consisted of 4 exons; the exon 1 and a part of exon 2 of LOC348751 were not present (Fig. S2). In addition, FONG had many splicing variants. Most of the variants contained exons 2 and 3 of FONG in common, but the first and last exons had variants. The new transcript that we have found (Fig. 3) existed in several tissues like kidney, skeletal muscle, liver and bone (Fig. 4A) and its predicted protein sequence was conserved among several species. Therefore, we think the transcript that we have found (Fig. 3) is a major splicing variant of FONG.
The N-terminal amino acid sequences of FONG corresponding to the FTCD-N domain are highly conserved from Xenopus to human, suggesting its important biological role. FTCD is a mammalian metabolic enzyme which involves in conversion of histidine to glutamic acid, and the FTCD-N domain has a transferase activity that transfers a formimino group from Nformimino-L-glutamic acid to tetrahydrofolate to generate glutamic acid and 5-formiminotetrahydrofolate [25]. The glutamate signaling is considered to play an important role in bone homeostasis. For instance, L-glutamic acid is known to be secreted by osteoclasts and knockout mice of the glutamate transporter 1 develop osteoporosis [26]. These lines of evidence suggest that FONG have a potential to regulate bone metabolism.
While preparing this paper, annotation of LOC348751 in public database is updated and LOC348751 has come to be described as a miscRNA. However, the length of the LOC348751 mRNA in the database (NR_034096.1) is shorter than that of the FONG mRNA that we experimentally determined. The new LOC348751 mRNA consisted of 966 bp and its open reading frame (ORF) encoded only 77 amino acid residues, while the FONG mRNA consisted of 1,997 bp and its ORF encoded 147 amino acids. Besides, the protein sequence of FONG is well conserved between different species. We suspect that LOC348751 is one of the FONG variant transcripts. However, because we have not yet succeeded in proving the existence of the FONG protein experimentally, we cannot deny the possibility that FONG functions as a miscRNA. Recently, many miscRNAs are found and their important roles in pathogenesis of diseases have been known [27][28][29].
The most associated SNP, rs7605378, is in perfect LD with 12 SNPs (Table S3). All of them are in the FONG region, but they do not cause amino acid substitutions. These SNPs are located in intron 3 or 39 flanking region. Therefore, they may have affect FONG expression. Two ESTs containing rs7605378 are reported. In our experiments, we could not find any FONG splicing variant(s) containing these ESTs. However, some splicing variants seems tissue specific and FONG may have other splicing valiant(s). Further analysis of these transcripts may provide a new insight into FONG function.
In conclusion, this study identified a previous unknown gene, FONG as a novel susceptibility gene for osteoporosis. Although FONG function and its osteoporosis-causing mechanism are largely unknown, our findings would provide a new insight into the complex genetic architecture of osteoporosis. The identified variants are warranted by further biological and clinical investigation.

Subjects
We carried out a stepwise case-control association method as previously described [10,[30][31][32], using several independent populations (Table S2). Case and control subjects used in discovery stages were obtained from the BBJ [19]. Osteoporosis was diagnosed according to the criteria of Japanese Osteoporosis Society as bone mineral density (BMD) being ,70% of young adult mean (YAM) at either the lumbar spine or femoral neck [33]. This criteria is equivalent to that of the World Health Organization (WHO) of T-score,22.5. BMD at the lumbar spine (L2-4 or L1-4) and/or femoral neck was measured by dual energy radiograph absorptiometry with standard protocols. All individuals in the osteoporosis populations were postmenopausal and/or over 60 years female. The controls were the subjects with various diseases other than osteoporosis as previously described [19]. For the replication study, the criteria of cases are same as the discovery stage and that of controls are postmenopausal females and/or females over 60 years. The cases in Replication 1 were also obtained from BBJ. The cases in Replication 2 and the controls in Replication 1 and 2 were obtained from unrelated ambulatory volunteers. All the participants provided written informed consent. This research project was approved by the ethical committees at Institute of Medical Science, the University of Tokyo and Center for Genomic Medicine, RIKEN.

SNP genotyping
Using standard protocols, genomic DNA was extracted from peripheral blood leukocytes. In Discovery 1, 268,064 SNPs from autosomal chromosomes were genotyped by using high-density oligonucleotide arrays (Perlegen Sciences). These SNPs were selected from JSNP [34] or HapMap database [35] as tagging SNPs for Japanese. SNPs having call rate .90% and no significant deviation from Hardy-Weinberg equilibrium (HWE; P$1.0610 26 ) were used for the analysis of association. A total of 224,507 SNPs were passed QC filters and were further analyzed for their association. Among the SNPs analyzed in Discovery 1, top 3,000 SNPs showing the smallest P values were selected for Discovery 2. Genotyping of Discovery 2 was conducted using the multiplex-PCR invader assay [36] or highdensity oligonucleotide arrays (Perlegen Sciences). In this stage, 1,654 SNPs passed QC filters (call rate of $0.9, P value of HWE$0.01 in controls, and concordance rates of .90% between Perlegen and Invader assays using randomly selected 94 case samples and 752 control samples). Among the SNPs analyzed in the discovery stages, top three SNPs showing the smallest P values were selected for the replication study in the replication stage. Genotyping in the stage was conducted using the multiplex-PCR invader assay or the TaqMan assay (Applied Biosystems). All cluster plots were checked by visual inspection and SNPs with ambiguous calls were excluded.

Statistical analysis
In the discovery stage, Fisher's exact test was applied to two-bytwo contingency table in three genetic models: an allele frequency model, a dominant-effect model, and a recessive-effect model. At the replication stage, the association was assessed using x 2 test that was applied to two-by-two contingency table in the three genetic models. Odds ratios and confidence intervals were calculated using the minor allele as a reference. The haplotype association was analyzed using Haploview software [37]. A PCA was conducted to detect population stratification [20]. A combined P value and heterogeneity were calculated using the Mantel-Haenszel method.

RACE, RT-PCR and real-time PCR
59-and 39-RACE were performed using Marathon-Ready cDNAs for human kidney, skeletal muscle and liver (Clontech). A human bone cDNA library was constructed using FastTrack 2.0 mRNA Isolation kit (Invitrogen) and SMART RACE cDNA amplification kit (Clontech) according to the manufacture's protocol. A bone cDNA was synthesized using Multiscribe reverse transcriptase and a random hexamer primer (Applied Biosystems). The bone cDNA and multiple tissue cDNA panels (Clontech) were used for PCR experiments to examine tissue-specific expression of FONG. Quantitative real-time PCR was carried out using an ABI PRISM 7700 sequence detector with Quantitect SYBR Green PCR Kit (Qiagen) in accordance with the manufacturers' instructions.

Northern blotting
The cDNA fragment corresponding to nucleotides 413-731 of FONG was cloned into the pCR2.1 TOPO vector (Invitrogen).
The DIG-labeled probe was synthesized from the constructed vector using DIG RNA Labeling Kit (Roche). Total RNAs of kidney, skeletal muscle and liver were purchased from Clontech (The skeletal muscle: seven male/female Caucasians, kidney: 14 male/female Caucasians, liver: a male Caucasian). Total RNAs of bone was extracted from bone tissues of nine male/female Japanese. mRNAs were synthesized using FastTrack 2.0 mRNA Isolation kit (Invitrogen) and 2 mg of mRNAs were used for gel electrophoresis. Transfer, hybridization and detection were done using DIG Easy Hyb and DIG Wash and Block Buffer set (Roche) according to the manufacturer's instructions. Figure S1 Design of our staged association study. We performed a genome-wide screening as the first stage of discovery (Discovery 1), followed by further examination of the top findings (Discovery 2). We then performed two replications (Replication 1 and 2) and resequencing of the LD block. In each stage, we consider the minimum P value in three genetic models. (TIF)