Identification, Replication, and Fine-Mapping of Loci Associated with Adult Height in Individuals of African Ancestry

Adult height is a classic polygenic trait of high heritability (h 2 ∼0.8). More than 180 single nucleotide polymorphisms (SNPs), identified mostly in populations of European descent, are associated with height. These variants convey modest effects and explain ∼10% of the variance in height. Discovery efforts in other populations, while limited, have revealed loci for height not previously implicated in individuals of European ancestry. Here, we performed a meta-analysis of genome-wide association (GWA) results for adult height in 20,427 individuals of African ancestry with replication in up to 16,436 African Americans. We found two novel height loci (Xp22-rs12393627, P = 3.4×10−12 and 2p14-rs4315565, P = 1.2×10−8). As a group, height associations discovered in European-ancestry samples replicate in individuals of African ancestry (P = 1.7×10−4 for overall replication). Fine-mapping of the European height loci in African-ancestry individuals showed an enrichment of SNPs that are associated with expression of nearby genes when compared to the index European height SNPs (P<0.01). Our results highlight the utility of genetic studies in non-European populations to understand the etiology of complex human diseases and traits.


Introduction
Adult height is a classic polygenic trait of high heritability (h 2 ,0.8) [1,2]. A recent large meta-analysis of genome-wide association (GWA) results for height, which included data from .180,000 individuals of European descent, identified 180 loci that associate with variation in height [3]. The most significantly associated variants at these loci explain approximately 10% of the variance, consistent with the hypothesis put forward in 1918 by Fisher on the ''cumulative Mendelian factors'', which suggested that the segregation of a large number of genetic variants, each of small effect, is sufficient to explain the variation in height observed in humans [4].
In parallel to the work in European-ancestry populations, GWA studies for adult height in other ethnic groups, including Koreans, Japanese, Africans, and African Americans have also been performed [5][6][7][8][9][10]. The GWA scans in East Asians replicated several of the height loci already identified in individuals of European descent, and also found evidence for new height loci not previously implicated in individuals of European ancestry [6,7]. The studies in Africans and African Americans were modest in size and, although they replicated nominally some of the associations previously found in European populations, were not well-powered to find new population-specific height loci [8,9].
To search for novel loci for height in populations of African ancestry, and to explore systematically the replication of previously validated height loci, we combined GWA results for height from nine studies totaling 20,427 individuals of African descent. We identified two novel height loci and observed significant evidence for the replication of European height signals in African-derived populations. In finemapping of the European height loci we also identified variants that better define the association in individuals of African ancestry and control local gene expression in cis (cis-eQTLs), suggesting that they are likely to be better surrogates of the biologically functional alleles.

Results/Discussion
The meta-analysis included results from nine studies: four population-based African-American studies (ARIC (N = 2,740), CARDIA (N = 699), JHS (N = 2,119), and MESA (N = 1,646)), one family-based African-American study (CFS (N = 386)), African-American GWA study consortia of breast (AABC (N = 5,380)) and prostate cancer (AAPC (N = 5,526)) and two case-control studies of obesity (Maywood (N = 743)) and hypertension (Nigeria (N = 1,188)) (Materials and Methods, Text S1 and Table S1). We tested associations between 3,310,998 genotyped or imputed SNPs and sex-, age-, and disease status-adjusted height Zscores under an additive genetic model, correcting for global admixture using principal components (PCs) as covariates, and modeling family structure when appropriate (Text S1). Height results for each study were scaled using genomic control, and then combined using the inverse-variance meta-analytic method (Text S1).
The quantile-quantile (QQ) plot suggested little departure from the null expectation, except at the right end tail of the distribution ( Figure 1). The associations that deviate most strongly from the null correspond to loci previously associated with height in European populations, providing a strong validation of our approach ( Table 1). The overall inflation factor in the metaanalysis was l GC = 1.064 and results were again scaled using genomic control, a slightly conservative approach [11].
Two genomic loci (LCORL on chromosome 4 and PPARD on chromosome 6), previously implicated in height in European populations [3], reached genome-wide significance in the discovery meta-analysis (P,5610 28 ; Table 1, Figure S1 and Table S2). We prioritized 153 SNPs with P,1610 25 from our meta-analysis for in silico replication in up to 16,436 African Americans from five additional studies (Text S1). After combining the data in a joint analysis, 40 SNPs from 11 different chromosomal regions reached genome-wide significance (Table 1  and Table S2), including two SNPs not previously implicated in the regulation of height: rs12393627 on the X-chromosome and rs4315565 on chromosome 2 (Table 1). rs12393627 is located 3.2 kb upstream of the arylsulfatase E (ARSE) gene on chromosome Xp22 (Figure 2a). Mutations in the ARSE gene cause X-linked brachytelephalangic chondrodysplasia punctata (CDPX1; OMIM #302950), a congenital disorder of bone and cartilage development also characterized by short stature [12]. The co-localization of human growth syndrome genes with SNPs associated with adult height has been reported in Europeanancestry samples [3,13,14]. rs12393627 reached a P = 1.4610 26 in the initial meta-analysis (N = 8,333; the SNP was not on the genotyping arrays and/or could not be imputed for AABC, AAPC, Maywood, and Nigeria), and was strongly replicated for association with height in 13,153 African Americans (replication P = 2.6610 27 ; combined P = 5.7610 212 ) ( Table 1). When considering the number of independent markers in a 1 Mb window we found no secondary independent signals in the region conditioning on genotype at rs12393627. We also found no significant evidence of heterogeneity at rs12393627 between men and women (P = 0.26).
The derived A-allele (i.e. non-ancestral allele based on the chimp genome) at rs12393627 is monomorphic in the HapMap CEU individuals and has a frequency of 54% in the HapMap YRI participants. We also investigated the association of rs12393627 with height in 3,487 Japanese Americans and 2,979 Latinos from the Multiethnic Cohort (MEC) (Text S1). Whereas the marker was monomorphic in Japanese Americans, the association between height and rs12393627 was replicated in Latinos with a comparable effect size (A-allele frequency = 97%, standardized effect size = 20.17760.088, P = 0.044). The frequency of this allele is consistent with previous estimates of ,5-10% African ancestry among Latinos in the MEC [15]. Measures of local

Author Summary
Adult height is an ideal phenotype to improve our understanding of the genetic architecture of complex diseases and traits: it is easily measured and usually available in large cohorts, relatively stable, and mostly influenced by genetics (narrow-sense heritability of height h 2 ,0.8). Genome-wide association (GWA) studies in individuals of European ancestry have identified .180 single nucleotide polymorphisms (SNPs) associated with height. In the current study, we continued to use height as a model polygenic trait and explored the genetic influence in populations of African ancestry through a meta-analysis of GWA height results from 20,809 individuals of African descent. We identified two novel height loci not previously found in Europeans. We also replicated the European height signals, suggesting that many of the genetic variants that are associated with height are shared between individuals of European and African descent. Finally, in fine-mapping the European height loci in African-ancestry individuals, we found SNPs more likely to be associated with the expression of nearby genes than the SNPs originally found in Europeans. Thus, our results support the utility of performing genetic studies in non-European populations to gain insights into complex human diseases and traits.
ancestry (the number of European-derived chromosomes (0, 1, or 2) in each individual) were not available for the X-chromosome, but since the marker is polymorphic only in African-derived populations (according to HapMap phase 3 data [16]), the height association signal defined by rs12393627 on Xp22 is likely to be specific to these populations. SNP rs4315565 on 2p14 (discovery P = 1.5610 27 ; combined P = 1.2610 28 ) is located in intron 3 of the anthrax toxin receptor 1 (ANTXR1) gene, and 189 kb upstream of the bone morphogenetic protein 10 (BMP10) gene (Figure 2b), a member of the TGF-b signaling pathway. This pathway is important in normal skeletal growth [17] and implicated in previous GWA studies of height [3]. We observed no evidence of heterogeneity by sex (P = 0.34) and no independent signals when conditioning on rs4315565 within a 1 Mb window.
The allele frequency of rs4315565 differs strongly between the HapMap CEU and YRI samples: the derived A-allele, which is associated with decreased height, has a frequency of 85% in CEU and 2% in YRI, respectively (F st = 0.701). This allele frequency difference is consistent with recent weak positive selection acting in individuals of European ancestry (iHS = 21.668) [18], and could indicate an association with local ancestry. In a conditional analysis where we controlled for global ancestry using PCs as covariates, we did observe a significant association between height and local ancestry at the ANTXR1 locus, with an increase in the number of European chromosomes associated with a decrease in height (P = 1.6610 26 ; N = 18,495 samples available for this analysis) [19]. Still controlling for global ancestry with PCs, genotypes at rs4315565 could account for the association between local ancestry and height (P = 0.22 for local ancestry conditional on rs4315565), while the association of rs4315565 with height diminished but remained significant in the same model (P = 4.6610 28 and P = 0.0044, before and after conditioning on local ancestry; N = 18,495).
To investigate the relationship between rs4135565 and local ancestry further, we considered the background on which the rs4135565 variants were present in different individuals. In analyses stratified by the number of African/European chromosomes in the region, rs4315565 was nominally associated with height in African Americans that are homozygous (P = 0.038) or heterozygous (P = 0.043) for African chromosomes (with effect size stronger in African chromosome homozygotes) ( Table 2). In 1,188 Nigerians from the discovery phase, a similar trend between height and rs4315565 was observed (P = 0.075). rs4315565 was not significantly associated with height in African Americans that are homozygous for European chromosomes at the locus (P = 0.91), although the sample size of this sub-group is small (N = 943) ( Table 2). More strikingly, this variant is not associated with height in populations of European ancestry in the GIANT Consortium (N = 133,653, P = 0.66) [3]. Together, these results suggest that 2p14 harbors at least one novel height-associated variant that is strongly associated with African ancestry and is correlated with rs4315565 in African-but not European-derived chromosomes. Our results also indicate that rs4315565 is a better marker of the functional variant(s) than is local ancestry or any other SNPs represented in HapMap.
We then considered the previously known height loci. Of the 180 SNPs previously reported by the GIANT Consortium to be associated with height in populations of European ancestry, the effect estimates for 38 SNPs were in the same direction as the initial report and nominally associated (P,0.05) with height in the African-derived height meta-analysis. This number is however a lower-bound estimate of the number of known European height loci that replicate in individuals of African ancestry because it does not take into account different LD relationships in European and African chromosomes: since any of the SNPs in LD in Europeanancestry individuals with the GIANT height SNPs could be causal, this entire set of SNPs need to be evaluated, both in terms of statistical significance and direction of effect, for replication in the African height meta-analysis. To address this issue, we utilized a rigorous framework, described in the Materials and Methods section and graphically summarized in Figure S2, to test For loci with more than one SNP with a P,5610 28 , we list the SNP with the smallest combined P-value. Results for the 153 SNPs that were followed up by in silico replication are available in Table S2. We highlight in bold the two loci not previously implicated in the regulation of height in European-ancestry populations. Effect size (beta) and standard error (SE) are in Z-score units. The effect allele frequency is the average frequency across all African-American discovery studies. The heterogeneity P-value is based on Cochran Q heterogeneity test. I 2 is a measure of heterogeneity and represents the proportion of inconsistency in individual studies that cannot be explained by chance. doi:10.1371/journal.pgen.1002298.t001 Figure 2. SNPs are plotted using LocusZoom [22] by position on the chromosome against association with adult height (2log 10 P). The SNP name shown on the plot was the most significant SNP after the discovery meta-analysis. Estimated recombination rates (from HapMap) are plotted in cyan to reflect the local LD structure. The SNPs surrounding the most significant SNP are color coded to reflect their LD with this SNP (taken systematically for replication at the previously known European height loci in the African meta-analysis. We started with 161 of the 180 height SNPs identified by the GIANT Consortium (19 SNPs could not be tested because linkage disequilibrium (LD) information in HapMap was not available) [3], and generated 5,819 sets of 161 SNPs matched on minor allele frequency using the HapMap2+3 CEU dataset. We then counted the number of SNPs (also considering LD proxies) in the African height metaanalysis with directionally consistent (one-tailed) P#0.05 for the set of 161 height-associated SNPs and the simulated sets. We found one simulation with a count of nominal associations equal to or higher than what we observed for the 161 height-associated SNPs (P = 1.7610 24 ; 171 nominal associations for the GIANT height SNPs (and their proxies); median number of nominal associations to height in the matched sets of SNPs = 28 (range = 8-172)). Therefore, we found strong overall evidence of replication in our large meta-analysis of 20,427 individuals of African ancestry for SNPs previously associated with adult height in individuals of European ancestry, indicating a substantial shared genetic basis for height in populations separated since the out-of-Africa event.
The replication procedure described above also allowed us to identify, for each of the 161 European height loci that we assessed using data from our African meta-analysis, the best candidate height index SNP (Table 3 and Table S3). For instance in population of European ancestry at the LCORL locus on chromosome 4, the GIANT height SNP (rs6449353) and the SNP identified by fine-mapping in the African height metaanalysis (rs7663818) are both strongly associated with height (P,1610 225 ) and in strong LD (r 2 .0.8) with each other (Figure 3a). However, in African-derived populations, LD is weaker between the two SNPs (r 2 ,0.6) and the association with height is stronger for rs7663818 (P = 2.9610 27 ) than for rs6449353 (P = 0.0025) (Figure 3b). When we consider SNPs in strong LD (r 2 .0.8) with rs7663818 in HapMap CEU and YRI populations, they define genomic intervals of 250 kb and 80 kb, respectively (light blue boxes in Figure 3). Finally, in lymphoblastoid cell lines derived from YRI individuals (Materials and Methods), rs7663818, but not rs6449353, is associated with LCORL gene expression levels (LCORL eQTL P = 0.0026 and P = 0.13 for rs7663818 and rs6449353, respectively). Thus, the LCORL locus illustrates a clear example of the utility of finemapping association signals in other ethnic groups, both in terms of narrowing the genomic interval of interest and highlighting potential functional variants (cis-eQTL).
For 40 loci, the index SNPs from our fine-mapping list was nominally associated with height (P,0.05) in the African height meta-analysis, whereas the corresponding index European height SNPs was not. To test whether this result reflects an enrichment of surrogates for functional variants identified by fine-mapping, we designed an experiment using allelic gene expression phenotypes in the HapMap YRI cell lines as functional readouts. We hypothesized that if our trans-ethnic fine-mapping strategy was successful, a larger fraction of variants in the list of fine-mapped height SNPs should be associated with phenotypes (in this case gene expression) than of variants in the list of European index height SNPs. In other words, the list of SNPs from our finemapping experiment should contain more cis-eQTLs than the GIANT list of height SNPs in cell lines derived from Africans. We retrieved allelic expression mapping datasets from the HapMap YRI cell lines (Materials and Methods) and observed that 4.7% of the GIANT index height SNPs and 8.6% of the best candidate height SNPs obtained by trans-ethnic fine-mapping, were both nominally associated with height (P,0.05) in our meta-analysis and with allelic expression phenotypes (P,0.01). When we used simulations to assess the significance of these results, we found no simulated set with a cis-eQTL enrichment equal or above that observed in the data (P,0.01, obtained from 100 simulations (Text S1)). Therefore, fine-mapping European height loci in African-ancestry individuals generated a list of markers more likely to control gene expression, potentially improving mechanistic insights into the biology of height. Although we did not see an enrichment when compared to the list of GIANT index height SNPs, we also found that 17 missense SNPs are in strong LD (r 2 $0.8 based on HapMap phase II YRI) with the fine-mapped height SNPs (Table S4).
In conclusion, our study shows the benefit of performing largescale genetic studies in non-European populations to discover new biology (we identified two novel height loci), and to gain functional insights at the loci previously found in European-derived individuals (in this case, by enrichment of cis-eQTL signals). The strong replication of most of the European height loci in Africanancestry populations suggest that many of the published association signals with common variants from GWA studiesfor height and perhaps other complex diseases and traits -are relevant across different populations and caused by shared genetic factors that predate the out-of-Africa event.

Ethics statement
All participants gave informed written consent. The project has been approved by the local ethics committees and/or institutional review boards.

Studies
Five discovery studies/consortia (AABC, AAPC, CARe, Maywood, and Nigeria) and five replication studies (GeneSTAR, HANDLS, Health ABC, WHI, and MEC) contributed height association results to this project. There were eight population-  from pairwise r 2 values from the ARIC African Americans Affymetrix6.0 dataset for rs1239627 (A) and from the HapMap YRI data for rs4315565 (B)). The size of the points on the plots is proportional to the number of individuals with available genotype for any given SNP. Genes, the position of exons and the direction of transcription from the UCSC genome browser are noted. Hashmarks represent SNP positions available in the meta-analysis. doi:10.1371/journal.pgen.1002298.g002 Table 3. Fine-mapping results for SNPs associated with height in individuals of European descent [3]. In the left-handed side of the table, we present the association results in the African-American height meta-analysis for SNPs associated with height in European-ancestry individuals. In the right-handed side of the table, we present results from our fine-mapping experiment using data from our African-American height meta-analysis. Only SNPs with P,3610 24 (Bonferroni correction threshold; a = 0.05/161 SNPs) are presented here; the complete list of 161 SNPs (19 of the 180 height SNPs from the GIANT Consortium were not available for fine-mapping) is in Table S3. For intergenic SNPs, we provide the closest gene and the physical distance between them. doi:10.1371/journal.pgen.1002298.t003 (N = 1,139), WHI (N = 8,149) and MEC (N = 11,569)), two familybased cohorts (CFS (N = 386) and GeneSTAR (N = 1,148)) two case-control studies (Maywood (obesity, N = 743) and Nigeria (hypertension, N = 1,188)), and two cancer consortia comprised of case-control studies that were population-based or nested within prospective cohorts (AABC (breast cancer, N = 5380), AAPC (prostate cancer, N = 5,526). All cohorts with genome-wide genotyping data available were genotyped on the Affymetrix 6.0 array, except AABC, AAPC, HANDLS, HABC and GeneSTAR, that were genotyped on the Illumina 1M-duo or 1Mv1_c chip. The studies, including genotyping and quality control steps, are described in detail in Text S1. The statistics (height and age) are summarized in Table S1. Genotype imputation was performed as previously described [20] and is summarized in Text S1.

Statistical analysis
Height measures were corrected for sex, age, disease status, and other appropriate covariates (e.g. recruitment centers), and were normalized into Z-scores (Text S1). Association analysis was performed using linear regression for studies of unrelated individuals and a linear mixed effect model for family-based studies, testing an additive model and including the 4-10 first principal components. Results were combined using the inverse variance meta-analysis method. Local ancestry was estimated using the HAPMIX software using default parameters [19]. Conditional analyses were performed by including SNP genotypes or local ancestry estimates in the linear models.

Replication of European height loci in African Americans
The list of European height loci from the largest study to date was used as a source of known European loci for fine-mapping [3]. The procedure is graphically summarized in Figure S2. Of the 180 SNPs from this list, 19 were filtered for lack of available LD data (we combined data from HapMap2 haplotype release 22 (Aug 2007), HapMap3 haplotype release 2 (Jul 2009), and HapMap2+3 LD data release 27 (Apr 2009); conflicting data, as is the case for these 19 SNPs, were excluded). LD estimates (r 2 ) from CEU HapMap 2+3 were used to generate the set of common SNPs (proxies) tagging the remaining putative loci (r 2 $0.8). These sets were then binned using YRI HapMap 2+3 LD as follows: the whole list of proxies was randomized, to remove any bias towards significance in the representative P-values; the first SNP was removed and set as an ''index'' SNP; then all SNPs not yet binned were filtered based on LD (r 2 $0.3) with the index SNP. This procedure was repeated until all SNPs were binned. The metric for replication of a European signal was the number of SNP bins nominally significant (P#0.05), and replication of the entire list of known SNPs was the number of significant bins across all loci. Each SNP bin was represented by the index SNP used to generate it. Because the SNPs are in LD with known European signals, there is a strong prediction as to which index SNP allele should be increasing height: it should be the allele in LD with the heightincreasing allele in Europeans. Therefore, all index SNP P-values were made one-tailed (set to P/2 or 1-P/2) based on the hypothesis that the height-increasing allele should be the one predicted by the European SNP, based on the phased HapMap CEU data.
The LD thresholds used for proxy determination in European ancestry and binning in African ancestry were arbitrary and likely do not fully encompass the LD structure of the populations in this meta-analysis. To control for artifacts introduced by these thresholds and the HapMap data, 5,819 sets of 161 SNPs, matched to the European known loci on HapMap 2+3 CEU minor allele frequency, were generated. Since the European SNP list contains independent loci, each simulated list was designed to contain relatively independent SNPs (CEU r 2 $0.2); changing this threshold did not alter the results. The same procedure of proxy generation and SNP binning (see Figure S2 for a graphical description of the binning strategy) was performed on each of the 5,819 sets to generate a null distribution of significant bins.
To generate the list of ''best'' SNP for each locus (fine-mapped list), the binning procedure was repeated for the known SNPs, except each iteration selected an index SNP from the list of remaining SNPs, sorted on P-value, not randomized. Note that the best SNPs at each locus are not perfectly concordant between Table 1 and Table 3 because our fine-mapping approach did not consider the in silico replication data and required that the SNPs are available in the HapMap phased haplotypes. We note that our fine-mapping approach focuses on SNPs with low P-values and is thus more likely to identify markers with fewer missing genotypes, that is markers for which we have more statistical power.

Analysis of cis-acting eQTLs
To assess whether European SNPs replicated for height (at nominal P,0.05) in African-ancestry populations would also be more likely to show links to functional variation in samples of African ancestry, we applied a sensitive technique for mapping cisregulatory allelic expression SNPs [21] in lymphoblastoid cell lines (LCLs) derived from 56 unrelated Yoruba HapMap participants. A detailed description of the protocols and statistical methods used is available in the Text S1. Figure S1 Manhattan plot of the height meta-analysis (3,310,998 SNPs in up to 20,809 participants from 9 studies). The dashed line highlights the genome-wide significance threshold used in this study (P,5610 28 ). In the discovery phase of the project, SNPs at 4 loci reached genome-wide significance: LCORL on chromosome 4, PPARD on chromosome 6, SULF1 on chromosome 8, and ACAN on chromosome 15. The association between height and SNPs near SULF1 did not replicate. The 3 remaining loci -LCORL, PPARD, and ACAN -are loci previously associated with height in Europeans. Genomic-control P-values are displayed. (TIF) Figure S2 On the left, an example analysis for the European SNP rs12470505 (CCDC108). Top: rs12470505 (square) and proxies (circles; r 2 $0.8 in HapMap2+3 CEU), plotted with their Pvalues in the GIANT European analysis. Bottom: the same SNPs, plotted with African-American meta-analysis P-values, converted to one-tailed P-values based on predicted direction of effect from the European result and phased HapMap2 CEU data. Colors segregate SNPs into 6 randomly seeded ''independent'' clusters (r 2 $0.3) using HapMap2+3 YRI linkage disequilibrium estimates. Right: simulation results for the fine-mapping analysis. Simulations were matched to the European SNP list by minor allele frequency; SNPs in each simulation were independent of each other at r 2 $0.2 in HapMap2+3 CEU. The result for each simulation is significant bins/total bins. Red line indicates observed proportion of significant bins for true European SNP replication (P = 8.6610 26 ). (TIF)  . We could not fine-map 19 of the 180 SNPs reported by the GIANT Consortium because they were not available in the HapMap phased datasets. In the left-handed side of the table, we present the association results in the African-American height meta-analysis for SNPs associated with height in Caucasians. In the right-handed side of the table, we present results from our fine-mapping experiment using data from our African-American height meta-analysis. For intergenic SNPs, we provide the closest gene and the physical distance between them. (DOC)

Supporting Information
Text S1 Supplementary information. (DOC)