Multivariate Analysis of Anthropometric Traits Using Summary Statistics of Genome-Wide Association Studies from GIANT Consortium

Meta-analysis of single trait for multiple cohorts has been used for increasing statistical power in genome-wide association studies (GWASs). Although hundreds of variants have been identified by GWAS, these variants only explain a small fraction of phenotypic variation. Cross-phenotype association analysis (CPASSOC) can further improve statistical power by searching for variants that contribute to multiple traits, which is often relevant to pleiotropy. In this study, we performed CPASSOC analysis on the summary statistics from the Genetic Investigation of ANthropometric Traits (GIANT) consortium using a novel method recently developed by our group. Sex-specific meta-analysis data for height, body mass index (BMI), and waist-to-hip ratio adjusted for BMI (WHRadjBMI) from discovery phase of the GIANT consortium study were combined using CPASSOC for each trait as well as 3 traits together. The conventional meta-analysis results from the discovery phase data of GIANT consortium studies were used to compare with that from CPASSOC analysis. The CPASSOC analysis was able to identify 17 loci associated with anthropometric traits that were missed by conventional meta-analysis. Among these loci, 16 have been reported in literature by including additional samples and 1 is novel. We also demonstrated that CPASSOC is able to detect pleiotropic effects when analyzing multiple traits.


Introduction
For over a decade, genome-wide association studies (GWASs) have been a major tool for detecting genetic variants underlying complex traits [1], including various anthropometric traits. Anthropometric traits such as height and body mass index (BMI) are highly heritable [2,3], but the genetic loci that have been reported by large GWASs so far only account for a small portion of heritability, suggesting additional variants still need to be uncovered. Up until now, most GWASs were performed by examining a single trait association, although it has been frequently observed that the same variant or gene can be associated with multiple traits [4]. During the past decade, the number of meta-analysis of GWASs using multiple cohorts has increased in order to maximize the statistical power in detecting significant genetic loci associated with a trait [5]. Several large-scale meta-analyses have been conducted for anthropometric traits, such as height, BMI, and waist-to-hip ratio (WHR) [6][7][8][9][10][11]. Among these three anthropometric traits, height is a classic polygenic trait that has been heavily studied in order to understand the genetic architecture of complex traits or diseases [9]. BMI is a convenient measure of overall adiposity as well as obesity, which is a risk factor for many diseases. WHR is a measure of body fat distribution and WHR adjusted for BMI (WHRadjBMI) is positively associated with mortality. Up until recently, 423, 97, and 49 loci have been identified to be associated with height, BMI, and WHRadjBMI in European populations, respectively [9][10][11].
Several multivariate analysis approaches for analyzing multiple phenotypes have been developed to further improve statistical power and provide biological interpretation of results, and their pros and cons were well reviewed [4]. Statistical methods of integrating evidence from summary statistics of multiple traits have recently been developed [12][13][14]. In traditional metaanalysis studies, the samples from different cohorts are assumed to be independent. Overlapped or related samples between different cohorts and correlated traits within the same cohorts would result in correlation among effect sizes, and both would lead to an inflated type I error rate. However, this issue can be resolved by estimating the correlation of effect sizes using summary statistics from genetic markers across the genome [12,13,15]. Solovieff et al. [4] coined the association of a genetic variant with multiple traits as cross-phenotype (CP) association. Zhu et al. demonstrated that power could be improved by testing multiple traits using both simulated and real data [12,16]. CP association implies potential pleiotropy where a variant is associated with multiple traits regardless of underlying causes. Thus, CP association is more general than pleiotropy. Statistical software package for CP association analysis (CPAS-SOC) was developed to meta-analyze association evidence of genetic variant with correlated traits [12].
In this paper, we hypothesized that genetic variants commonly underlying the three anthropometric traits can be identified by combining all three traits when single trait analyses may not have enough power to detect them. Since the three traits are correlated, conventional metaanalysis assuming independence will result in an inflated type I error. In addition, it is unclear whether there are any overlapped or related samples among GIANT consortium studies, which would lead to correlated summary statistics among cohorts. Findings from meta-analysis accounting for correlation among cohorts will result in a more accurate estimation of genetic effects on traits. In this study, we compared the results of CPASSOC with that from conventional meta-analysis of GIANT consortium studies when using the same discovery phase data set. We performed meta-analysis with summary statistics of height, BMI, and WHRadjBMI obtained from the GIANT consortium [17] using the CPASSOC software.

Results
GIANT consortium provided sex-specific summary statistics for the three traits: height, BMI and WHRadjBMI [17], which were downloaded from the GIANT consortium website (https:// www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files). The correlations among the six cohorts were calculated using the summary statistics of common SNPs after linkage disequilibrium pruning (Materials and Methods, S1 Table). The correlation between male and female summary statistics is 0.03, 0.096 and 0.0126 for WHRadjBMI, BMI and height, respectively (S2 Table). The correlations for BMI and height are still high despite of excluding the SNPs with absolute Z score larger than 1.96. The large correlations between male and female summary statistics for height and BMI suggest that many genetic variants with small effect sizes contribute both trait variations besides the possible overlapped or related samples. The correlation was much smaller between traits than within traits (S2 Table). We next calculated statistics S Hom and S Het in CPASSOC (Materials and Methods). S Hom makes an assumption that genetic effect is homogenous across traits and cohorts while S Het assumes genetic heterogeneity. Fig 1 presents the Q-Q plots for height, BMI, and WHRadjBMI individually as well as combined together, using S Hom and S Het . The genomic control values (λ) for statistics S Hom , are 1.078, 1.079, 1.025 and 1.092 for height, BMI, WHRadjBMI and combining the three traits, respectively. For S Het , the λ values are 0.998, 1.006, 0.994 and 0.992 for height, BMI, WHRadjBMI and combining three traits, respectively. We did not observe any inflation for S Het but slight inflation was observed for height and BMI for S Hom . Both height and BMI have many causal variants, which can inflate the corresponding λ values. In the original and the latest GIANT reports [18,19], the λ values were 1.42 for height and 1.526 for BMI. Our observed λ values for both S Hom and S Het are much smaller than that in GIANT reports. Thus, our observed association evidence is unlikely due to the effects of population stratification. We observed that females have substantially more genetic contribution than males for WHRadjBMI ( Fig 1C). Table 1 compares the number of significant loci (P < 5 × 10 −8 ) detected by CPASSOC and conventional meta-analysis by GIANT. For height, both conventional meta-analysis and S Hom identified 116 loci and 3 of them were missed by either method, while S Het identified 89 loci. For BMI, S Hom and S Het identified 20 and 17 loci, respectively, while the conventional meta-analysis detected 18 loci. For WHRadjBMI, S Hom and S Het identified 11 and 14 loci, respectively, while meta-analysis detected 11 loci. The detailed P-values and effect sizes for these variants in CPASSOC and conventional meta-analysis are presented in S3 Table. P-values for combining the three traits, S Hom and S Het identified 55 and 129 loci, respectively, while conventional meta-analysis identified 122 loci for three traits together. The detailed P-values and effect sizes for these variants in CPASSOC and conventional meta-analysis are presented in S4 Table. The reason for substantially fewer variants detected by S Hom is due to different effect directions in the three traits.
The Manhattan plots for the trait specific analysis are presented in Fig 2. The three loci identified by S Hom but not by conventional meta-analysis for height are rs4676386, rs2597513, and rs8181166 ( Table 2 and Fig 2A). Variants rs2597513 and rs8181166 were reported to be associated with height in meta-analyses combining discovery and follow-up phases [6], which had a substantially increased sample size. The region of rs4676386 has been reported in [9], and the reported variant rs4344931 was located 43,541 bp away from rs4676386. Those two variants were in weak linkage disequilibrium (r 2 = 0.18).
For BMI, S Hom identified rs13078807, rs13107325, and rs3810291 and S Het identified rs17806313, all of them were missed by conventional meta-analysis (Table 2 and Fig 2B). Among these four variants, rs13078807, rs13107325, and rs3810291 were validated in the follow-up stage [7] and rs17806313 was only significant in females [17] (Table 2).
For WHRadjBMI, both S Hom and S Het identified rs9864077, which was missed by conventional meta-analysis. S Het identified two additional loci (rs4684854, rs2301573, Table 2 and Fig  2C). Both rs4684854 and rs2301573 were not genome-wide significant in the combined analysis of discovery and replication phases [8], but they were genome-wide significant in females only (P = 2.36 × 10 −08 and P = 9.93 × 10 −11 , respectively) ( Table 2) [17].
When combining three gender specific traits, 7 independent variants were identified by S Hom and S Het but were missed by the conventional meta-analysis (Table 3 and Fig 3). Variants rs9324162 and rs17391694 were detected by both S Hom and S Het and were located more than 500 kb apart. These two variants were in weak linkage disequilibrium (r 2 = 0.4). Thus, we consider them as two separate signals. Variant rs17391694 was reported to be associated with height by the GIANT study using conventional meta-analysis when combining discovery and follow-up phase data [6] [17]. The remaining five variants in Table 3 were only identified by S Het . SNP rs6441170 as well as rs13107325 were reported in later GIANT studies for height [9] and BMI [10], respectively. Variant rs17806313 was significantly associated with BMI in females only (Table 3). SNP rs4488509 is in high linkage disequilibrium (r 2 = 0.84) with reported SNP rs4640244 in later GIANT studies for height [6], and they are located 450 bp away. Variant rs7842858, has not been reported previously. SNP rs7842858 is located on TOX (Fig 4), which has been reported to be associated with obesity and diabetes [20].
To compare the variants identified by S Hom and S Het , we plotted the forest plots of the effect sizes for the SNPs in Tables 2 and 3, which were missed by the conventional meta-analysis in GIANT. We observed that the effect sizes were similar for variants only detected by S Hom (Fig  5A, rs4676386, rs2597513, rs8181166, rs13078807, rs13107325, and rs3810291). All the variants detected by both S Hom and S Het showed some degree of heterogeneity (Fig 5B, rs9864077, rs9324162, and rs17391694). The variants detected only by S het showed large amount of heterogeneity ( Fig 5C, right panel of SNPs). In particularly, variants rs17806313, rs4684854 and rs2301573 only have genetic effects in females for BMI and WHRadjBMI. Variants rs644170 and rs13107325 have positive effects on BMI but negative effects on height and WHRadjBMI. Variant rs7842858 has a negative effect on height and WHRadjBMI in both genders, negative effect on BMI in males, and positive effect on BMI in females (Fig 5). The effect of rs4488509 is negative for height and positive for BMI and WHRadjBMI. The effect of rs17806313 shows substantial heterogeneity between males and females in each of the three traits.

Discussion
In this study, we performed CPASSOC analysis using the summary statistics available from a GIANT consortium study [17]. Our results showed that CPASSOC was able to identify most of Note: CPASSOC (cross-phenotype association), GIANT (genetic investigation of anthropometric traits), BMI (body mass index), WHRadjBMI (waist-to-hip ratio adjusted for body mass index) a CPASSOC was applied to meta-analyze male and female data for each of the three traits. b The result of conventional meta-analyses of discovery phase data for each of the three traits. the loci detected by conventional meta-analysis of GIANT consortium. CPASSOC was also able to identify 17 loci reaching genome-wide significance (P < 5 × 10 −8 ) that were missed by conventional meta-analysis for a single trait analysis when using the same discovery phase data. All 10 loci (Table 2) detected by CPASSOC that combine the effects of males and females for height and BMI have been replicated by GIANT using either additional replication samples or are significant in either males or female [6,7,11]. Our results showed that CPASSOC analysis can improve statistical power over conventional meta-analysis by combining males and females for individual traits. When combining male and female summary statistics for height, the statistic S Het identified fewer loci than either S Hom or the conventional meta-analysis method, suggesting that there is rare heterogeneity between males and females for height. However, the current study only analyzed autosomal variants rather than variants on the sex chromosome.
When combining the three traits, S Hom identified 55 loci while S Het identified 129 loci. The substantial fewer number of variants detected by S Hom suggests that many variants have different effect directions in the three traits; therefore, the association evidence by S Hom is diluted. This result further indicates that the statistic S Hom would be less powerful when combining multiple different traits, where substantial heterogeneity exists (Fig 5). In our analysis of combining height, BMI and WHRadjBMI, CPASSOC was able to identify additional 7 loci ( Table 3) that were missed by conventional meta-analysis (Table 3 and Fig 3). Five variants (rs9324162, rs17391694, rs6441170, 13107325, and rs17806313) have been reported to be associated with height, BMI or in gender-specific analysis by the GIANT consortium using additional data [6,9,10,21]. CPASSOC also identified one novel locus (rs7842858) that has not been reported to be associated with any of the three traits previously. This variant shows substantial heterogeneity among the three traits. In particular, variant rs7842858 has a positive effect on the three traits in males but negative effect in females (Fig 5), which leads to the failure of detection by either conventional single trait meta-analysis or combining the three traits by S Hom . SNP rs7842858 (8:58987111) resides within TOX (Fig 4), which has been reported to be associated with Type 2 Diabetes in Chinese Han population [20]. It is possible that this variant affects obesity variation and therefore indirectly contributes to Type 2 Diabetes. However, this hypothesis requires further studies. The CPASSOC analysis of combining height, BMI and WHRadjBMI also demonstrated pleotropic effects for all 7 loci (Fig 5). This is because the association evidence of these loci could not be detected by single trait analysis using the GIANT discovery data. The association evidence could only be detected when combining the three traits. We also observed substantial genetic heterogeneity among these three traits. When pleotropic effects exist, CPASSOC analyses of multiple traits clearly improve statistical power to detect these genetic effects. In particular, statistic S Het is able to detect many variants with pleotropic effects; however, it may lose power when there is no heterogeneity, as observed in single trait analysis. Our analysis applied a significance level 5 × 10 −8 , which is widely used in GWAS including the original GIANT publications. This criterion makes the results identified by CPASSOC comparable with that from GIANT results. Alternatively, the methods by [22], which requires genotype data, may result better genome-wide significance level. Since we only analyzed the summary statistics obtained from GIANT website, we did not applied the method by [22].    When we used a genome-wide significance level 1.0 × 10 −7 , we were able to identify additional 134 loci (S5 Table). These variants should be further examined in independent studies. The CPASSOC [12] was designed to combine the association evidence across multiple, sometimes seemingly distinct phenotypes. Cross phenotype association will increase statistical power when analyzed traits share common variants or common genetic pathways, which may reflect the relevance of pleiotropy [4]. Thus, cross phenotype association analysis is preferred when a genetic variant affects more than one trait either through biological mediated pleiotropy. For example, in genetic study of hypertension, different measurements of blood pressure such as systolic, diastolic blood pressure, and hypertension status can be analyzed using CPAS-SOC [12].The hypertension related traits can be analyzed together with cardiovascular traits because of sharing potential common pathways. In the current study, we analyzed the anthropometric traits by cross phenotype analysis because these traits are likely share common biological pathways. Many traits have demonstrates different biology between males and females [17]. Sex-specific analysis is able to detect specific variants to a sex group and therefore such analysis should always be performed. On the other hand, males and female share common biology and combined analysis will increase a study sample size and consequently increase statistical power for identifying variants shared by both males and females. Our results show that CPASSOC analysis can detect sex-specific variants as well as the variants shared by both sexes.

P-value
We also observed much larger correlations between male and female summary statistics for height and BMI compared to that of WHRadjBMI (S2 Table), although we pruned SNPs with linkage disequilibrium and excluded SNPs with large summary statistics. These high correlations for height and BMI may not be surprising because of the contributions of many genetic variants with small effect sizes for both traits. In comparison, the lower correlation between male and female summary statistics for WHRadjBMI may suggest that most of genetic contribution is shared between BMI and WHR.
There are several recently developed statistical approaches in analyzing multiple traits at summary statistics level, including the ASSET [23] and the Cross Phenotype Meta-Analysis (CPMA) [24]. The ASSET is suitable for identifying a subset of associated traits while CPAS-SOC directly evaluates the aggregated association evidence between a SNP to multiple phenotypes. CPASSOC is much faster than the ASSET computationally because ASSET has to search all possible subsets. When the number of traits and studies increased, the number of possible subsets can grow exponentially. The Cross Phenotype Meta-Analysis (CPMA) was developed to test whether there is association of a SNP to multiple phenotypes, or a true pleiotropy effect. When one of traits is not associated with a SNP, CPMA will not have statistical power. It will be interesting to perform these two tests to search for the subsets of traits associated with a SNP or to test for pleiotropy effect in the GIANT consortium.
In conclusion, CPASSOC identified 17 loci associated with anthropometric traits that were missed by conventional meta-analysis for a single anthropometric trait in GIANT consortium studies that used the same discovery phase data as we did. CPASSOC is also able to detect pleiotropic effects when analyzing multiple traits.

Materials and Methods Datasets
The summary statistics of height, BMI and WHRadjBMI were downloaded from the GIANT consortium website (https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_ consortium_data_files). The downloaded data include sex-specific [17] and sex-combined [6][7][8] GWASs meta-analysis summary statistics from the discovery phase. For discovery stage in sex-specific studies, 46 studies (up to 60,586 males, 73,137 females) on height and BMI and 32 studies (up to 34,629 males, 42,969 females) on WHR were included [17]. In the discovery phase of sex-combined studies, summary statistics were collected from 46 GWASs in a metaanalysis of 133,653 individuals (60,587 males and 73,066 females) for height [6], 46 studies with up to 123,865 individuals for BMI [7], and 32 studies with up to 77,167 individuals (34,601 males and 42,735 females) for WHRadjBMI [8].

Cross-phenotype association analysis
We applied the CPASSOC package developed by Zhu et al. [12] to combine association evidence of both sexes with height, BMI and WHRadjBMI. CPASSOC can integrate association evidence from summary statistics of multiple traits. It uses summary-level data from single SNP-trait association of GWASs to detect which variant is associated with at least one trait. This method improves statistical power by analyzing multiple phenotypes and it can be executed with the summary statistics from GWASs. CPASSOC provides two statistics, S Hom and S Het . S Hom is similar to the fixed effect meta-analysis method [25] but accounting for the correlation of summary statistics among cohorts induced by potential overlapped or related samples. In brief, assuming we have summary statistical results of GWAS from J cohorts with K phenotypic traits. In each cohort, single SNP-trait association was analyzed for each trait separately. Let T jk be a summary statistic for a SNP, j th cohort and k th trait. Let T = (T 11 ,Á Á Á,T J1 , Á Á Á,T 1K ,Á Á Á, T JK ) T represents a vector of test statistics for testing the association of a SNP with K traits. We used a Wald test statistic T jk ¼b jk s jk , whereb jk andŝ jk are the estimated coefficient and corresponding standard error for the k th trait in the j th cohort. S Hom is then defined as which follows a χ 2 distribution with one degree of freedom, where e T = (1,. . .,1) has length J × K and W is a diagonal matrix of weights for the individual test statistics. We used the sample sizes for the weights, i.e., w jk ¼ ffiffiffiffi n j p for the sample size n j of the j th cohort.
To define S Het, we first let where T(τ) is the sub-vector of T satisfying |T jk | > τ for a given τ>0, and R(τ) is a sub-matrix of R representing the correlation matrix, and W(τ) be the diagonal submatrix of W, corresponding to T(τ). Here we let w jk ¼ ffiffiffiffi n j p Â signðT jk Þ. Then the test statistic is S Het = max τ>0 S (τ). The asymptotic distribution of S Het does not follow a standard distribution but can be evaluated using simulation. S Het is an extension of S Hom but power can be improved when the genetic effect sizes vary for different traits. The distribution of S Het under the null hypothesis can be obtained through simulations or approximated by an estimated beta distribution. We first applied both S Hom and S Het to combine sex-specific summary statistics for each of the three traits and compared the results with those from conventional meta-analysis of the same discovery phase data in GIANT consortium studies [6][7][8]. We next applied both S Hom and S Het for combining all the sex-specific summary statistics of the three traits: height, BMI and WHRadjBMI. We hypothesized that meta-analyzing multiple traits would allow us to identify additional variants that are likely to be missed by the conventional meta-analyses for a single trait.
To perform CPASSOC analysis, a correlation matrix is required to account for the correlation among phenotypes or induced by overlapped or related samples from different cohorts. Zhu et al. [12] suggested using a set of SNPs in linkage equilibrium to estimate the correlation coefficients. We selected the SNP set based on linkage disequilibrium (LD) pattern in the ARIC European American (EA) data (downloaded from dbGaP http://www.ncbi.nlm.nih.gov/gap). In brief, the ARIC EA cohort includes 9,707 individuals with approximately 840,000 SNPs genotyped on the Affymetrix Array 6.0 [26,27]. We first applied pairwise LD pruning with r 2 threshold of 0.2 using the software PLINK (http://pngu.mgh.harvard.edu/purcell/plink/). SNPs with large effect sizes may represent true association, and consequently may inflate correlation among summary statistics. Therefore, we removed SNPs whose summary statistics Z scores were greater than 1.96 or less than -1.96. The final SNP sets for correlation estimation include 81,322 SNPs for height, 82,012 SNPs for BMI, and 81,130 SNPs for WHRadjBMI. We chose the common sets of SNPs for both sexes and three traits that can be mapped to dbSNP human Build 142 to perform the CPASSOC analyses. The numbers of SNPs used in this study are presented in S1 Table. We reported loci that reached genome-wide significance (P < 5 × 10 −8 ) by CPASSOC from sex-specific data [17], but not by sex-combined conventional meta-analysis [6][7][8] when using the same samples from the discovery phase. Here we applied the same significant level P = 5 × 10 −8 as in GWAS because CPASSOC performs the same number of tests although multiple traits are analyzed. To do this, for a SNP reaching P < 5 × 10 −8 by either S Hom or S Het , we examined the region within 500 kb of each side of the SNP. The SNP was considered to be identified only by CPASSOC if no SNPs that are genome-wide significant with conventional meta-analysis from the discovery phase data were found in the 1.0 Mb region, and it is not in LD with the index SNPs of the GIANT studies. We performed meta-analysis by combining male and female data for each trait separately, as well as by combining all the three traits and both sexes.
Supporting Information S1 Table. The number of SNPs mappable to dbSNP human Build 142, which were used in meta-analysis with CPASSOC. a SNPs used for conventional meta-analyses of sex-combined discovery phase data in the GIANT consortium studies. b Intersection set of SNPs used for conventional meta-analyses of sex-specific discovery phase data in the GIANT consortium studies. c Intersection set of SNPs between sex-combined and sex-specific study. d Intersection set of SNPs among height, BMI, and WHRadjBMI. (DOCX) S2 Table. Correlations between male and female cohort for each trait and those between combinations of sex and trait. (DOCX) S3 Table. SNPs representing identified through either conventional meta-analysis (sex combined and sex-specific)or CPASSOC for height, BMI, and WHRadjBMI. (XLSX) S4 Table. SNPs representing identified through either CPASSOC for combining 3 sex-specific traits or conventional meta-analysis of GIANT consortium (sex-combined and sexspecific) in discovery phase data.