Genome-Wide Association Study Reveals Multiple Loci Influencing Normal Human Facial Morphology

Numerous lines of evidence point to a genetic basis for facial morphology in humans, yet little is known about how specific genetic variants relate to the phenotypic expression of many common facial features. We conducted genome-wide association meta-analyses of 20 quantitative facial measurements derived from the 3D surface images of 3118 healthy individuals of European ancestry belonging to two US cohorts. Analyses were performed on just under one million genotyped SNPs (Illumina OmniExpress+Exome v1.2 array) imputed to the 1000 Genomes reference panel (Phase 3). We observed genome-wide significant associations (p < 5 x 10−8) for cranial base width at 14q21.1 and 20q12, intercanthal width at 1p13.3 and Xq13.2, nasal width at 20p11.22, nasal ala length at 14q11.2, and upper facial depth at 11q22.1. Several genes in the associated regions are known to play roles in craniofacial development or in syndromes affecting the face: MAFB, PAX9, MIPOL1, ALX3, HDAC8, and PAX1. We also tested genotype-phenotype associations reported in two previous genome-wide studies and found evidence of replication for nasal ala length and SNPs in CACNA2D3 and PRDM16. These results provide further evidence that common variants in regions harboring genes of known craniofacial function contribute to normal variation in human facial features. Improved understanding of the genes associated with facial morphology in healthy individuals can provide insights into the pathways and mechanisms controlling normal and abnormal facial morphogenesis.


Introduction
Numerous lines of converging evidence indicate that variation in facial morphology has a strong genetic basis.These include the results of human heritability studies using twin and parent-offspring designs [1][2][3][4][5], Mendelian craniofacial syndromes [6], transgenic animal models with distinctive craniofacial phenotypes [7][8][9], and studies mapping QTLs for craniofacial shape in several mammalian models [10][11][12][13][14].However, we still have little understanding of how genetic variation relates to the diversity of normal facial traits commonly observed in humans.Understanding the genetic basis for normal facial variation has important implications for human health.Many genetic syndromes with dysmorphic facies are characterized by relatively subtle morphological changes, often involving quantitative traits with continuous distributions [6].The range of variation for any given facial trait often displays substantial overlap between affected and healthy individuals.Thus, understanding the genetic factors that contribute to normal facial trait variation may provide valuable insights into the causes of craniofacial dysmorphology, including common craniofacial birth defects such as orofacial clefts [15,16].
To date, only a few studies have explicitly tested for associations between aspects of normal human facial morphology and common genetic variants.Among these, two genome-wide association (GWA) studies have been carried out on healthy individuals of European ancestry using 3D facial imaging and a combination of traditional and more advanced morphometric methods to derive phenotypes [17,18].Between these two studies, a handful of intriguing genome-wide significant signals were reported, although they were largely non-overlapping.Notably, both studies reported an association between PAX3 variants and anatomical changes in interorbital region, an intriguing finding given that mutations in PAX3 cause Waardenburg Syndrome type 1 which is characterized by hypertelorism among other morphological abnormalities.Both studies also reported significant associations with measures of nasal projection in their discovery cohorts, although different genomic regions were implicated.In addition, several more focused candidate gene studies of loci implicated in craniofacial syndromes or in developmental pathways involved in craniofacial development have connected one or more craniofacial dimensions or aspects of shape with a small number of common genetic variants [19][20][21][22][23][24][25][26][27][28].At least three candidate gene studies [20,25,28] have reported modest associations between common variants in FGFR1 and normal variation in craniofacial morphology, but in each case a different constellation of traits was involved.It is notable that none of the genes from these studies, including FGFR1, were identified in the two previous GWA studies of facial morphology.
Thus, while prior studies have detected a handful of biologically plausible genes associated with variation in craniofacial features, it is clear that these efforts are just scratching the surface and the potential for additional discovery is great.In the current study, we performed GWA analyses on a set of 20 craniofacial measurements commonly used in clinical assessment (Fig 1) derived from 3D surface images in two well-characterized samples of unrelated White individuals of European ancestry from the USA: a sample comprised of 2447 individuals collected through the University of Pittsburgh (i.e., the Pittsburgh sample) and an independent sample of 671 individuals collected under the direction of the University of Colorado (i.e., the Denver sample).All participants were genotyped using the same SNP array (Illumina OmniExpress+Exome v1.2), which included just under one million SNPs, and were imputed to the 1000 Genomes reference panel (Phase 3).We conducted association tests in each sample separately and combined the results using a meta-analytic approach.

Results
In total, we observed seven associations in five traits that exceeded the conventional threshold for genome-wide significance (p < 5 x 10 −8 , Table 1; Figs 2-5).One of the associations also exceeded our study-wide significance threshold of p < 5 x 10 −9 , calculated based on 10 independent traits (see Methods for details).Due to the large number of traits, we will limit our presentation of results to genome-wide significant signals.The entire list of meta-analysis associations with p-values < 5 x 10 −7 is available in S1 Table .Manhattan plots showing the meta-analysis results, as well as the results for each sample, are available in supplemental figures S1-S20 Figs.
We observed two significant associations for cranial base width: one at 14q21.effects of rs2424399 on nasal width.Finally we observed a significant association with upper facial depth at 11q22.1 (top SNP rs12786942, p = 4.59 x 10 −8 , Fig 5).For all of the above associations, the results were driven primarily by the larger Pittsburgh dataset.The cranial base width (14q21.1),intercanthal width (1p13.3)and upper facial depth associations were at least nominally significant (p < 0.05) in both datasets.Sample-specific association results for all SNPs with p-values less than 5 x 10 −7 are listed in S2 In an attempt to replicate the main findings from the prior two GWA studies in Europeans, we tested previously implicated SNPs against traits from our Pittsburgh dataset that capture similar aspects of morphology.This was not possible for every prior genotype-phenotype association given differences in the measurements available.With that limitation in mind, the Pittsburgh dataset was chosen for comparison because it was the larger of our two datasets and the phenotyping protocol was most similar to prior GWA studies.For Paternoster et al. [17], we attempted to test three of their four genome-wide significant associations, two of which involved nasal ala length (Table 2).In our data, nasal ala length showed a nominally significant association (p = 0.018) with rs1982862, an intronic variant in CACNA2D3.Conversely, we found no evidence of association between this measure and rs11738462, an intronic variant in C5orf64.The previously observed association between PAX3 and the position of nasion relative to the orbits could not be tested directly.However, we found no evidence of association between the implicated SNP rs7559271 and intercanthal width, which captures aspects of interorbital septum morphology.As a further exploratory analysis we also looked at the association between rs7559271 and several vertical or projective measurements involving nasion, but no significant associations were found for any of these traits.For Liu et al. [18], we attempted to test each of their six previously reported genome-wide significant associations (Table 2).We observed a strong association (p = 1.70 x 10 −5 ) between nasal ala length and rs4648379, an intronic variant in PRDM16.We also observed an association between rs6555969, a SNP near C5orf50 and upper facial depth (p = 0.005), which is a reasonable approximation of the zygionnasion distance reported by Liu et al. [18].To test the association between interorbital distance and rs17447439, an intronic variant in TP63, we used measures of intercanthal and outercanthal width; however, we did not observe an association with either measure.Finally, Liu et al. [18] reported associations between SNPs in PAX3, C5orf50 and COL17A1 and the position of nasion relative to the orbits.We tested these three SNPs in our dataset against intercanthal width, a trait involving roughly similar anatomical components.Notably, we found associations between rs974448 (PAX3, p = 0.002) and rs6555969 (C5orf50, p = 0.049) and intercanthal width.The most significant of these, meeting the strict study-wide threshold for significance (i.e., p < 5 x 10 −9 ), was the association of cranial base width at 20q12 410kb downstream of MAFB, a transcription factor previously implicated in orofacial clefts [29] and facial characteristics in cleft families [30].However, the MAFB SNP associated with clefting was 250kb away and not in LD with the SNP observed here.In addition to orofacial clefting, mutations in MAFB cause multicentric carpotarsal osteolysis syndrome, which includes mild facial anomalies.These phenotypes are consistent with the developmental role of MAFB in regulating the migration of cranial neural crest cells during the patterning of skeletomuscular features of the head [31].Altogether, these lines of evidence suggest a possible role for MAFB in normal facial variation.
Another association for cranial base width was observed at 14q21.1 in the vicinity of PAX9, SLC25A2, MIPOL1, and FOXA1.The homeodomain protein-coding PAX9 is important for craniofacial development in mice [32,33] and dental development in humans [34].Using in situ hybridization, Peters et al. [32] showed that Pax9 is expressed throughout the developing cranial base in mice at E13.5.Biological evidence for other genes in this region also suggests possible roles in facial variation including MIPOL1, which has been observed to be affected by chromosomal aberrations in patients with craniofacial phenotypes, including holoprosencephaly [35].Because holoprosencephaly involves alteration in the horizontal spacing of facial structures, variants in genes associated with this condition may also influence measures of craniofacial width in healthy subjects.Taken together, these previous observations point to the plausibility of genetic variants in this region influencing normal facial variation.
Two genetic associations were observed for intercanthal width.One of these was at 1p13.3 within the gene GSTM2, which codes an enzyme involved in detoxification of compounds.Among the genes within 250kb of the peak signal are two potentially relevant candidate genes, GNAI3 and ALX3.Mutations in GNAI3, which encodes a G protein subunit involved in pharyngeal arch patterning, cause auriculocondylar syndrome, a rare craniofacial disorder [36,37], although hyper-or hypotelorism have not specifically been described.ALX3 is a homeobox gene essential for head and face development.Mutations in ALX3 result in frontonasal dysplasia 1 [38] in humans and nasal clefts in mice [39].Ocular hypertelorism is a prominent feature of frontonasal dysplasia and is believed to result from disruptions of the Hedgehog signaling pathway [40,41].The second association with intercanthal width was observed for a broad 900kb LD block on Xq13.2.The peak of the diffuse association signal is over PABP1-C1L2A, which encodes an uncharacterized poly-A binding protein.However, at the edge of the LD block, roughly 500kb centromeric to the peak signal, is HDAC8, which encodes a histone deacetylase involved in epigenetic gene silencing during craniofacial development [42].Mutations in HDAC8 cause Cornelia de Lange syndrome [43,44], a developmental disorder characterized by facial dysmorphology including hypertelorism.A mutation in HDAC8 has also been described in a family with an X-linked intellectual disability syndrome with distinctive facial features, which included hypertelorism [45].
A number of other genetic associations with facial traits were observed at loci harboring genes with relevant biological roles.For example, an association with nasal width was observed at 20p11.22 near the PAX1 gene.Mutations in PAX1 cause otofaciocervical syndrome [46], characterized by facial dysmorphology, including specific nasal features such as a sunken nasal root and excessive narrowing.PAX1 plays a role in chondrocyte differentiation [47], which may explain its association with nasal width, a measure of the distance between the left and right cartilaginous nasal alae.Nevertheless, a study of Pax1 expression in mice showed expression in the pharyngeal arches at E11.5, but not in the developing olfactory placodes [48], so it is unclear how this gene influences nasal development.An association with nasal ala length was observed at 14q11.2 in a region containing an RNase gene cluster plus at least 25 other genes (within about 400kb of the association peak).Among the many genes in region are ZNF219, which encodes a transcriptional partner of Sox9 essential for chondrogenesis in mice [49], and CHD8, mutations in which are associated with autism spectrum disorder in conjunction with macrocephaly and distinct facial features including a broad nose [50].A similar story pertains to the association between SNPs on 11q22.1 and upper facial depth.The peak signal occurs within TRPC6, which encodes a cation channel subunit mutated in hereditary renal disease [51].TRPC6 has no known connection to craniofacial development, but other genes in the region have reported craniofacial expression, including YAP1 [52].
In aggregate, we observed a number of genetic associations near genes with biologically plausible roles in facial variation.A common theme was that associated loci harbored genes involved in syndromes with craniofacial phenotypes.This result fits with a long-standing hypothesis about the relationship between Mendelian syndromes and complex traits in which common variants near genes causing Mendelian syndromes are involved in related common, complex diseases and traits, including normal phenotypic variation [53].That being said, for any of the observed associations, it is not clear which variant might be functional, though we hypothesize that the functional variants underlying the statistical signal will be regulatory.Moreover, it is not clear which genes they regulate.Thus, references to implicated genes should always be treated with appropriate caution.
While none of our genome-wide or suggestive (p < 5 x 10 −7 ) signals included SNPs implicated in either of the previous two European-focused GWA studies [17,18], we nevertheless found evidence of association when we tested the top SNPs from these studies against comparable phenotypes from our data.Strongest among these was nasal ala length, a lateral projective measure of the nose extending from alar cartilage to the nasal tip, previously associated with 1p36.32 (rs4648379, PRDM16) [18] or 3p14.3 (rs1982862, CACNA2D3 [17].We found at least nominal associations with both of these regions in our data, with one (rs4648379, PRDM16) showing evidence at p = 1.70 x 10 −5 .Both prior GWA studies reported an association between SNPs at 2q35 (PAX3) and morphology of the interorbital septum.We tested these SNPs and found an association between rs974448 and intercanthal width (p = 0.002), lending some additional support to the claim that common variants in PAX3 might influence aspects of normal facial morphology.
Our ability to test previously reported genetic associations was limited by a lack of directly comparable phenotypes, which is related to differences in data collection methods and the type and number of measurements available.In addition, the prior two European GWA studies each used imaging modalities different from the kind used here.Similar factors may also explain some of the discrepancies in association results observed between our two study cohorts.Although care was taken to generate the same set of distance measures in both cohorts, the different 3D cameras and landmarking protocols used could result in different patterns of association.Despite these differences, the measurements from both cohorts were found to be very similar in their overall distributions.Alternative phenotypes, such as multivariate measures of facial shape, can also be used in these types of studies.However, prior attempts to extract shape variation from landmark and surface data in similarly sized samples (e.g., using Procrustes-based methods) have not yielded statistically significant associations [17,18].One reason for this may be that the effect of any single gene is diluted because the resulting phenotypes represent such a complicated mix of local and global shape features.The problem is highly complex and there is presently little consensus on the most prudent approach to complex facial phenotyping [54].Fortunately, several promising approaches are on the horizon, such as the BRIM methods being developed by Claes et al. [28].It is likely that samples an order of magnitude larger than anything available at the moment will be required before we can begin to exploit the richness contained in human 3D facial datasets.
Despite these limitations, we have found evidence of genetic association between chromosomal regions containing genes with important roles in facial development and quantitative traits that characterize key features of the normal human craniofacial complex.In addition to improving our general knowledge of the factors that underlie the diversity of facial forms we see in humans, these genotype-phenotype associations may help us better understand the wide range of phenotypic expression and severity seen in some rare genetic syndromes.Common variants in a number of different genes or regulatory elements may contribute to the expression of dysmorphic phenotypes present in these conditions.Moreover, such associations in healthy individuals may aid the search for clues to the etiology of much more common craniofacial anomalies.For example, three of the traits reported here (cranial base width, nasal width and intercanthal width) have all been previously implicated as potential phenotypic risk factors for orofacial clefting, the most common craniofacial birth defect in humans [55].Weinberg et al. [15,56] have shown that the unaffected, but genetically at-risk, relatives of cleft-affected individuals exhibit increased breadth through the middle and upper face.The identification of the genes that influence these traits may help us identify important risk-modifiers for clefting [16].Testing the SNPs implicated here for associations in our cleft families is a future goal of our research group.

Study samples
Our study included two independent samples, each comprised of unrelated self-described White individuals of European ancestry from the United States.The Pittsburgh sample included 2447 unrelated individuals ranging in age from three to 49 years.The majority of these participants (n = 2272) were recruited at research centers in Pittsburgh, Seattle, Houston and Iowa City as part of the FaceBase Consortium's 3D Facial Norms Dataset, described in detail by Weinberg et al. [57].The remaining subjects were recruited as healthy controls for a separate study at Pittsburgh on orofacial cleft genetics.The Denver sample included 671 unrelated individuals ranging in age from three to 12 years.These participants were recruited from Denver and San Francisco as part of a separate FaceBase Consortium study of facial shape genetics in multiple ethnic populations [58].The basic demographic features of these samples are provided in S4 Table .In both samples, subjects were excluded if they had a personal history of facial trauma, a personal history of facial reconstructive or plastic surgery, a personal history of orthognathic/jaw surgery or jaw advancement, a personal history of any facial prosthetics or implants, a personal history of any palsy, stroke or neurologic condition affecting the face, a personal or family history of any facial anomaly or birth defect, and/or a personal or family history of any syndrome or congenital condition known to affect the head or face.

Phenotyping
3D facial surfaces were captured using digital stereophotogrammetry, a standard imaging method resulting in high-density, geometrically accurate point clouds representing the surface contours of the human body [59].Facial surfaces in the Pittsburgh sample were collected with 3dMD imaging systems (3dMD, Atlanta, GA).Facial surfaces in the Denver sample were imaged using the Creaform Gemini camera system (Quebec, Canada).A common set of 24 standard facial soft-tissue landmarks [60] was collected on each 3D facial surface and the xyz coordinate locations recorded (S22 Fig) .Landmarks were collected manually in the Pittsburgh sample as described in Weinberg et al. [57].An automated landmark collection method was used in the Denver sample.From these landmarks, we calculated 20 linear distances that correspond to craniofacial measurements commonly used in clinical assessment [61].These measurements are shown in

Genotyping, quality checks, imputation, and population structure
For each study sample, genotyping was performed by the Center for Inherited Disease Research (CIDR).Saliva samples were used to genotype 3,186 participants for 964,193 SNPs on the Illumina (San Diego, CA) OmniExpress+Exome v1.2 array plus 4,322 custom SNPs chosen in regions of interest based on previous studies of the genetics of facial variation.In addition, 70 duplicates and 72 HapMap control samples were genotyped for quality assurance purposes.Data cleaning was performed by the University of Washington Genetics Coordinating Center (UWGCC) using standard analysis pipelines implemented in the R Environment for Statistical Computing, as previously described [62].These analyses include interrogating samples for genetic sex, chromosomal anomalies, relatedness among participants, missing call rate, and batch effects, and interrogating SNPs for missing call rate, discordance between duplicate samples, Mendelian errors (as measured in HapMap control parent-offspring trios), Hardy-Weinberg equilibrium, and differences in allele frequency and heterozygosity between sexes (for autosomal and pseudo-autosomal SNPs).Supplemental S6 Table shows the number of SNPs omitted and retained for each quality filter.
Imputation was performed to capture information on unobserved SNPs as well as sporadically missing genotypes among genotyped SNPs, using haplotypes from the 1000 Genomes Project [63] Phase 3 reference panel of 2,504 samples from 26 worldwide populations.First, pre-phasing was performed in SHAPEIT2 [64], and then imputation of 34,985,077 variants was performed in IMPUTE2 [65,66].Masked variant analysis-that is, imputation of genotyped SNPs as though they were unobserved in order to assess imputation quality-showed high concordance between imputed and observed genotypes (0.998 for SNPs with MAF < 0.05 and 0.982 for SNPs with MAF 0.05) indicating high quality imputation.Imputed SNPs were included in analyses if the minimum genotype probability for a given variant was greater than 50%.
Principal component analysis using 96,700 autosomal SNPs pruned from the total panel based on call rate (> 95%), MAF (> 0.05), and LD (pairwise r 2 < 0.1 in a sliding window of 10 Mb), was used to assess population structure.Supplemental S23 Fig depicts the observed genetic structure of the population across the first two principal components of ancestry (i.e., eigenvectors from the PCA).Linear regression was used to test the association between each 1 (top SNP rs79272428, p = 1.01 x 10 −8 , Fig 2A) and the other at 20q12 (top SNP rs6129564, p = 1.65 x 10 −9 , Fig 2B).Notably, the chromosome 20 association exceeded our strict threshold for studywide statistical significance.For intercanthal width, we observed two significant associations: one at 1p13.3 (top SNP rs619686, p = 4.70 x 10 −8 , Fig 3A) and the other at Xq13.2 (top SNP rs11093404, 4.16 x 10 −8 , Fig 3B).There were also significant associations with nasal width at 20p11.22 (rs2424399, p = 2.62 x 10 −8 , Fig 4A) and nasal ala length at 14q11.2 (top SNP rs8007643, p = 3.36 x 10 −8 , Fig 4B).We observed a second independent peak on chromosome 20 for nasal width located 371kb upstream of the main peak.The second peak remained (top SNP rs80186620, p = 5.32x10 -6 , S21 Fig) after conditional association analysis adjusting for the

Fig 2 .
Fig 2. LocusZoom plots showing genome-wide significant associations observed in the metaanalysis for cranial base width (Fig 1A).(A) chromosome 14 and (B) chromosome 20.LocusZoom plots show the association (left y-axis; log10-transformed p-values) with facial traits.Genotyped SNPs are depicted by stars and imputed SNPs are depicted by circles.Shading of the points represent the linkage disequilibrium (r 2 , based on the 1000 Genomes Project Europeans) between each SNP and the top SNP, indicated by purple shading.The blue overlay shows the recombination rate (right y-axis).Positions of genes are shown below the plot.doi:10.1371/journal.pgen.1006149.g002

Fig 3 .
Fig 3. LocusZoom plots showing genome-wide significant associations observed in the metaanalysis for intercanthal width (Fig 1H).(A) chromosome 1 and (B) chromosome X.LocusZoom plots show the association (left y-axis; log10-transformed p-values) with facial traits.Genotyped SNPs are depicted by stars and imputed SNPs are depicted by circles.Shading of the points represent the linkage disequilibrium (r 2 , based on the 1000 Genomes Project Europeans) between each SNP and the top SNP, indicated by purple shading.The blue overlay shows the recombination rate (right y-axis).Positions of genes are shown below the plot.doi:10.1371/journal.pgen.1006149.g003

Fig 4 .
Fig 4. LocusZoom plots showing genome-wide significant associations observed in the metaanalysis for nasal width (Fig 1K) and nasal ala length (Fig 1N).(A) nasal width on chromosome 20 and (B) nasal ala length on chromosome 14.LocusZoom plots show the association (left y-axis; log10-transformed pvalues) with facial traits.Genotyped SNPs are depicted by stars and imputed SNPs are depicted by circles.Shading of the points represent the linkage disequilibrium (r 2 , based on the 1000 Genomes Project Europeans) between each SNP and the top SNP, indicated by purple shading.The blue overlay shows the recombination rate (right y-axis).Positions of genes are shown below the plot.doi:10.1371/journal.pgen.1006149.g004

Fig 5 .
Fig 5. LocusZoom plot showing genome-wide significant association observed in the meta-analysis for upper facial depth (Fig 1B) on chromosome 11.LocusZoom plots show the association (left y-axis; log10-transformed p-values) with facial traits.Genotyped SNPs are depicted by stars and imputed SNPs are depicted by circles.Shading of the points represent the linkage disequilibrium (r 2 , based on the 1000 Genomes Project Europeans) between each SNP and the top SNP, indicated by purple shading.The blue overlay shows the recombination rate (right y-axis).Positions of genes are shown below the plot.doi:10.1371/journal.pgen.1006149.g005 Institutional ethics (IRB) approval was obtained at each recruitment site and all subjects gave their written informed consent prior to participation(University of Pittsburgh Institutional Review Board #PRO09060553 and #RB0405013; UT Health Committee for the Protection of Human Subjects #HSC-DB-09-0508; Seattle Children's Institutional Review Board #12107; University of Iowa Human Subjects Office/Institutional Review Board #200912764 and #200710721; Colorado Multiple Institutional Review Board #09-0731; UCSF Human Research Protection Program Committee on Human Research #10-00565).
Fig 1 and listed in S5 Table.
For bilateral measurements, values were summed across the left and right sides in order to minimize the number of traits tested.Trait values were inspected for outliers by computing age-and sex-specific z-scores.
Table for the Pittsburgh sample and S3 Table for the Denver sample.

Table 1 .
Genome-wide significant meta-analysis results for five traits.

Table 2 .
Testing of previously identified genome-wide significant SNPs in European samples.
a orbit landmark measured from MRI at the approximate location of the pupil doi:10.1371/journal.pgen.1006149.t002