Bivariate Genome-Wide Association Analyses Identified Genes with Pleiotropic Effects for Femoral Neck Bone Geometry and Age at Menarche

Femoral neck geometric parameters (FNGPs), which include cortical thickness (CT), periosteal diameter (W), buckling ratio (BR), cross-sectional area (CSA), and section modulus (Z), contribute to bone strength and may predict hip fracture risk. Age at menarche (AAM) is an important risk factor for osteoporosis and bone fractures in women. Some FNGPs are genetically correlated with AAM. In this study, we performed a bivariate genome-wide association study (GWAS) to identify new candidate genes responsible for both FNGPs and AAM. In the discovery stage, we tested 760,794 SNPs in 1,728 unrelated Caucasian subject, followed by replication analyses in independent samples of US Caucasians (with 501 subjects) and Chinese (with 826 subjects). We found six SNPs that were associated with FNGPs and AAM. These SNPs are located in three genes (i.e. NRCAM, IDS and LOC148145), suggesting these three genes may co-regulate FNGPs and AAM. Our findings may help improve the understanding of genetic architecture and pathophysiological mechanisms underlying both osteoporosis and AAM.


Introduction
Bone strength at the hip is directly related to the risk of hip fracture, the most serious and disabling type of osteoporotic fractures [1]. Femoral neck geometry is a major determinant of the mechanical resistance of the hip and plays an important role, independent of bone mineral density (BMD), in determining bone strength and osteoporotic fractures [2]. Femoral neck geometric parameters (FNGPs) measure bone structural properties (such as shape, size, and microarchitecture) and are believed to be as good as BMD in predicting hip fracture risk [3]. FNGPs, including cortical thickness (CT), periosteal diameter (W), buckling ratio (BR), cross-sectional area (CSA), and section modulus (Z), can be conveniently and accurately inferred from dual energy X-ray absorptiometry (DXA) measurements.
Menarche is the first menstrual cycle in female human beings, which occurs when thickened endometrial tissue undergoes a sudden death because of fluctuations of estrogen levels. Age at menarche (AAM) is an important factor that affects women's health. Late AAM is related with higher risk of osteoporosis in women, which may partially due to the less exposure to estrogen [13,14]. It has been shown that AAM is under strong genetic control and heritability of AAM is as high as 50-70% [15,16].
Since FNGPs and AAM are heritable traits highly related to risk of osteoporosis and women's health, it would be interesting to investigate whether there are pleiotropic genes that influence variation in both FNGPs and AAM. Previous studies were largely conducted on FNGPs or AAM separately using univariate analysis method which does not consider potential correlations between them [7,17]. This problem could be addressed by performing multivariate analysis which analyzes correlated traits simultaneously. Compared to univariate analysis, multivariate analysis has an advantage in detecting pleiotropic genes by considering the correlations between traits in the model. In addition, the multiple testing problems caused by testing different traits separately in univariate analysis can be alleviated in multivariate analysis.
Here we report the first bivariate genome-wide association study (GWAS) for FNGPs and AAM in a sample of 1,728 unrelated US Caucasian female subjects followed by replication analyses in independent Caucasians and Chinese samples. We identified three genes (i.e. NRCAM, IDS and LOC148145) that were associated with both FNGPs and AAM, suggesting their roles in co-regulating FNGPs and AAM. Our findings may help improve the understanding of genetic architecture and pathophysiological mechanisms underlying both osteoporosis and AAM.

Subjects
The discovery sample contained 1,728 unrelated Caucasian female subjects recruited in Midwestern US (Kansas City, MO, and Omaha, NE) for studies of osteoporosis and related health problems. All the identified subjects were US Caucasians of European origin. The replication samples included Caucasians and Chinese. The Caucasian replication sample included 501 unrelated female subjects living in Omaha, NE, USA, and its surrounding areas. There was no overlap between the discovery and replication Caucasian samples. The Chinese replication sample included 826 unrelated female subjects living in Changsha or Xi'an, China.
The study was approved by Institutional Review Boards of Creighton University, University of Missouri-Kansas City, Hunan Normal University of China and Xi'an Jiaotong University of China. All the subjects signed informed-consent documents and completed structured questionnaires. The exclusion criteria were detailed in our previous publications [18]. Briefly, subjects with chronic diseases and conditions that might affect bone metabolism and AAM were excluded from this study.
Areal BMD (g/cm 2 ) and region area (cm 2 ) of FN were measured using dual-energy X-ray absorptiometry scanners Hologic QDR 4500 W (Hologic Inc., Bedford, MA, USA). The machines were calibrated daily. The coefficient of variation (CV) values obtained from the DXA measurements for FN bone size and FN BMD were 1.94% and 1.87%. Bone geometric parameters were calculated using the DXA-derived FN BMD and bone size. The methods for calculating these variables were detailed elsewhere [7,19,20]. The five estimated FNGPs are CT, W, BR, CSA, and Z. CT is an estimate of mean cortical thickness, W is the outer diameter of the bone, BR is an index of cortical instability indicating the risk of fracture by buckling, CSA is an indicator of bone axial compression strength, and Z is an index of bone bending strength indicating the bending resistance of a tube.

Genotyping
For each study subject, we extracted genomic DNA from peripheral blood leukocytes using standard protocols, and genotyping experiments were performed strictly following the standard protocol recommended by related manufacturer. The subjects of the discovery sample and the Chinese replication sample were genotyped using Affymetrix Genome-wide Human SNP 6.0 genotyping arrays (Affymetrix, Santa Clara, CA, USA) which include 909,622 SNPs [21]. The Caucasian replication sample was genotyped using Affymetrix Human Mapping 500 K arrays which included 500,567 SNPs.

Quality Control
In order to obtain high quality genotyping data, we followed strict quality control procedures. Samples with a minimum call rate of 95% were included in the analyses. For the Caucasian sample, the final mean call rate reached a level of 98.93%. We discarded SNPs that deviated from Hardy-Weinberg equilibrium (p,0.0001) and those with a minor allele frequency (MAF) ,0.01. After quality control, the numbers of SNPs available for association analysis were 760,794 in the discovery sample, 702,413 in the Chinese replication sample, and 407,192 in the Caucasian replication sample.

Statistical Analyses
The five FNGPs were adjusted by age, height and weight [22], while AAM was adjusted by height and weight [23,24] using Minitab (Minitab Inc., State College, PA, USA). The covariatesadjusted phenotypic values were used in subsequent association analyses. We tested the phenotypic correlation between FNGPs and AAM using bivariate correlation analysis in Minitab.
The association analyses between genotype and the covariatesadjusted traits were performed using a bivariate linear regression model. The model is represented by y i~m zbx i ze i , where y i is the vector of the two traits for individual i; m is the vector of grand means; b is the vector of its effects, and x i is the genotype score for individual i. e i is the vector of residues following a multivariate normal distribution with mean zero. The genotype x i was encoded with an additive mode of inheritance. The association was examined by testing the significance of b, and the test was performed with R package lm. Individual p values achieved in the three studied samples were then combined by fisher's method [25]. Genomic control approach was used to evaluate population effect and control potential population stratification that may lead to spurious association results. The p value of 5610 28 was used as the threshold to claim significant associations at the genome-wide level, after accounting for multiple-testing by applying Bonferroni correction. Since Bonferroni correction is quite conservative, we reported SNPs that achieved a p value of 10 25 or less in the discovery stage. For replication, due to the prior evidence of association, we used a threshold of the nominal p level of 0.05. We evaluated the proportion of phenotype correlations between each trait pairs explained by the reported SNPs. We define corr1 as the original phenotype correlation coefficients and corr2 as the phenotype correlation coefficients after adjusted by the SNPs. The proportion of correlations between each trait pairs explained by the reported SNPs (corrp) was calculated by the formula: corrp~c orr1{corr2 ð Þ = corr1 : For the SNPs showing potential pleiotropic effects, we further investigated the effect direction and causal relationship. The effect direction of the SNPs was evaluated using a linear regression model implemented in PLINK [26]. The causal relationship of the SNPs was examined by comparing adjusted/conditional models in bivariate linear regression analyses, in which the genotype of each of the reported SNPs was adjusted as a covariate in turn.
For comparison purpose, we also performed univariate association analyses for each trait using a univariate linear regression model with the R package lm. To compare statistical power between univariate and bivariate association analyses, we performed power analyses using the GEE (Generalized Estimation  Equation) implemented in R. The power analyses were based on the sample sizes of 1,728, 826, and 501 unrelated subjects as used in the present study for discovery and replication analyses. Simulation analysis was performed to calculate the power.
The coefficient of linkage disequilibrium (LD) between specific SNPs was obtained from the Haploview system [27]. We used the FASTSNP program (http://fastsnp.ibms.sinica.edu.tw) to explore potential functions of the reported SNPs [28].

Results
The basic characteristics of the discovery and replication samples are summarized in Table 1. Correlation analysis of the study traits showed that AAM was significantly correlated with FNGPs. In the discovery sample, significant correlations were observed between AAM and three FNGPs (CT, W, and BR), and the correlation coefficients were 20.054 (p = 0.028) for AAM and CT, 0.043 (p = 0.082) for AAM and W, 0.077 (p = 0.002) for AAM and BR, respectively. The significant correlations observed here are consistent with previous findings of others [29,30]. We subsequently focused biviarate analyses on AAM and these three FNGPs.
We identified six SNPs that were associated with both FNGPs and AAM in the discovery sample (p,10 25 ). These SNPs were replicated in independent Caucasian and/or Chinese replication samples ( Table 2). Among them, rs4141232 is located in the upstream of the LOC148145 gene (p = 4.12610 27 for AAM-CT), rs6975557 and rs13230316 are located in the intron of the NRCAM (neuronal cell adhesion molecule isoform A) gene (p = 2.82610 25 and 5.58610 25 for AAM-W, respectively), and the other three SNPs, rs5980450, rs4844014, and rs7064959, are located in the downstream of the IDS (Iduronate-2-sulfatase) gene (p = 6.76610 25 , 7.31610 25 and 8.64610 25 for AAM-W, respectively). Three SNPs (rs4141232, rs6975557 and rs13230316) were replicated in the Caucasian sample (p = 0.03, 8.33610 25 and 1.10610 24 , respectively). The other three SNPs (rs5980450, rs4844014, and rs7064959) near the IDS gene were replicated in the Chinese sample (p = 0.05) ( Table 2). Combined p values of meta-analyses are also shown in Table 2.
The characteristics of the six SNPs bivariately associated with FNGPs and AAM are shown in Table 3. The proportions of phenotype correlation explained by the six SNPs were 18.52% for AAM-CT, 25.26% for AAM-W, and 12.99% for AAM-BR, respectively (Table 4).
Based on SNPs genotyped in GWAS, we estimated inflation factor (l) which is a measure of population stratification. Generally, for a homogenous population with no stratification the value of l should be equal or close to 1. In our GWAS cohorts, the estimated l values for AAM, CT, W, and BR were 0.938, 0.982, 0.934, and 0.936, respectively, suggesting no or very modest population stratification, if any.
We performed univariate association analyses for the six identified SNPs in the three studied samples. As presented in Table 5, p values of univariate association analyses were less  significant than those of bivariate analyses. Power calculation showed that bivariate analysis exhibited consistently higher statistical power than univariate analysis did for any of the three samples (Fig. 1). The effect directions of these SNPs are presented in Table 6. A positive beta value means that the minor allele is associated with a higher trait value. Since these SNPs were not included in the SNP arrays scanned in the Caucasian replication sample, their effect directions are not available for this sample. The effect direction for AAM was contrary between discovery and Chinese replication samples. For CT, W and BR, two SNPs of the NRCAM gene had the same effect direction in both samples, while three SNPs near the IDS gene had contrary effect directions between the two samples.
We examined the causal relationships of the six SNPs (Table 7). It can be seen that for the SNPs located in the same gene, using the genotype of one SNP as a covariate, the association signals disappeared, suggesting these SNPs are in linkage disequilibrium.  When the genotypes of the SNPs located in different genes were used as a covariate, the association signals remained, suggesting they are independent. From biology point of view, the concept of causality is more complex than comparison of adjusted/conditional models as used here. The causal effects of the variants need to be further explored and validated via deep re-sequencing of the gene locus and subsequent molecular functional studies. Analysis using the software Haploview in the Caucasian sample showed that SNPs rs6975557 and rs13230316 located in the NRCAM gene are in the same LD block (r 2 $ 0.97) (Fig. 2). The SNPs rs8113142, rs4141232, and rs4805257 near the LOC148145 gene are in two LD blocks (r 2 = 0.93, and 0.88, respectively) (Fig. 3). The SNPs rs5980450, rs4844014, and rs7064959 near the IDS gene are not available in the Haploview.
Using the FASTSNP program, we investigated the potential functions of these six SNPs. The SNPs rs6975557 and rs13230316 are located at potential transcription factor-binding sites and thus may have a role in transcription regulation. A GRA change at rs6975557 may result in the elimination of the binding sites for transcription factor GATA-1, while a GRC change at rs13230316 may produce a change in the binding sites of S8 and OCT-1. The other four SNPs (i.e., rs4141232, rs5980450, rs4844014, and rs7064959) did not show known functions according to the FASTSNP program.

Discussion
To the best of our knowledge, this is the first bivariate GWAS for FNGPs and AAM, which identified three novel genes (i.e., NRCAM, IDS and LOC148145) which may contribute to covariation of FNGPs and AAM. GWAS were largely performed by analyzing individual traits separately in a univariate framework. Although univariate analysis is effective for discovering novel genes responsible for a specific disease or trait, the approach generally ignores the potential genetic co-predisposition to human diseases. Bivariate analysis considers the correlation between traits and has an advantage in identifying genes with pleiotropic effects. The approach may help reveal the interconnected pathophysiological networks for a spectrum of common human diseases [31,32].
The gene NRCAM encodes a neuronal cell adhesion molecule [33]. Early studies showed that NRCAM gene expression increased during osteogenic and chondrogenic differentiation [34], suggesting it may function in osteoblasts and chondrocytes and probably be master control gene. The NRCAM gene is located on the chromosome 7q31, a region showed significant association with BMD in a published GWAS [35]. NRCAM is also a gene involved in regulation of estrogen. As one of the oocytespecific genes, the NRCAM possess embryogenesis cellular growth and differentiation identified from the human primordial follicles cDNA library [36]. The IDS gene encodes Iduronate-2-sulfatase which is required for the lysosomal degradation of heparan sulfate and dermatan sulfate [37]. Mutations in the gene that result in enzymatic deficiency may lead to the sex-linked Mucopolysaccharidosis Type II [38]. Since glycosaminoglycans are fundamental in connective tissue structure and function, mucopolysaccharidosis disorders are characterized by severe skeletal abnormality including abnormal bone structure, growth failure and severe articular cartilage and joint problems [39]. The IDS gene also has relationship to AAM. IDS plays an important role in metabolism of steroid hormones such as estrone and estradiol [40]. Significant declines in activity of IDS was observed in the mammary cell lines MCF7 and T47D with estrogen exposure, with higher doses of estradiol associated with more significant declines [41]. The collective evidence suggests that IDS is important for both bone metabolism and AAM.
The LOC148145 gene locates on the chromosome 19q12. It is the non-coding RNA gene. The biological function of this gene is currently unclear and thus the exact mechanisms by which LOC148145 is involved in co-regulation of bone and AAM await discovery.
For the six SNPs associated with FNGPs and AAM, the effect directions were not completely consistent in the discovery and replication studies. The inconsistency could be explained by several reasons. First, it may be caused by genetic heterogeneity. For instance, allele frequencies of genetic variants could be different among diverse populations due to different evolution histories, which results in different genotype phenotype associations [42]. Recent studies showed that replicable findings in specific populations might be more generalizable in other populations, and such variants are more likely to be causal in nature [43]. Second, GWAS is an indirect association method based on linkage disequilibrium between SNP markers. Significant associations may be found at genetic markers that are in linkage disequilibrium with causal variants, rather than the causal variants per se. Therefore, inconsistency of the effect directions could be a result of different patterns of linkage disequilibrium in different populations.
Interestingly, there was an overlap between the results of this study and those of some published studies. Elks et al. reported the largest meta-analysis of GWAS for AAM in 87,802 women of European ancestry [44]. In that study, the SNP rs6589964 was strongly associated with AAM (p = 1.9610 212 ). In our study, this . LD structure of the upstream of LOC148145 gene in the discovery Caucasian sample. The region demarcated in red indicates that r 2 .0.9. The region includes two LD blocks (Block1 and Block2) marked by triangles with black lines. SNPs rs8113142 and rs4141232 are located in Block1, and the SNP rs4805257 is located in Block 2. doi:10.1371/journal.pone.0060362.g003 SNP achieved p values of 6.64610 24 to 8.79610 24 for three trait pairs in the Caucasian discovery sample and p values of 2.86610 23 to 6.27610 23 in the Chinese replication sample.
In the current study, FNGPs were calculated based on DXAderived FN BMD and bone size. This is a convenient method to obtain bone geometric indices using the areal BMD with the assumption that the mineral in the cross section is confined to an annular cortical region. However, the DXA measured BMD is restricted to two dimensions, and the resolution and accuracy of the structural parameters are affected. Despite this, studies showed that the geometry of femoral neck cross sections was reasonably well characterized by DXA compares to a more rigorous 3D finite element technique [45]. In particular, due to the wide availability of DXA scanners and the low radiation exposure of scanning, DXA is still the most popular method in clinical settings and bone research.
In summary, by performing a bivariate GWAS, we identified three novel genes (NRCAM, IDS and LOC148145) that may coregulate FNGPs and AAM. Our findings need to be validated in different populations and molecular functional studies. Once confirmed, our findings may help improve our understanding of genetic architecture and pathophysiological mechanisms underlying osteoporosis and fracture risk. Our findings also furnish a foundation for further molecular and functional analyses of the genes in regulating timing of menarche and women's health in general.