The human face is a complex assemblage of highly variable yet clearly heritable anatomic structures that together make each of us unique, distinguishable, and recognizable. Relatively little is known about the genetic underpinnings of normal human facial variation. To address this, we carried out a large genomewide association study and two independent replication studies of Bantu African children and adolescents from Mwanza, Tanzania, a region that is both genetically and environmentally relatively homogeneous. We tested for genetic association of facial shape and size phenotypes derived from 3D imaging and automated landmarking of standard facial morphometric points. SNPs within genes SCHIP1 and PDE8A were associated with measures of facial size in both the GWAS and replication cohorts and passed a stringent genomewide significance threshold adjusted for multiple testing of 34 correlated traits. For both SCHIP1 and PDE8A, we demonstrated clear expression in the developing mouse face by both whole-mount in situ hybridization and RNA-seq, supporting their involvement in facial morphogenesis. Ten additional loci demonstrated suggestive association with various measures of facial shape. Our findings, which differ from those in previous studies of European-derived whites, augment understanding of the genetic basis of normal facial development, and provide insights relevant to both human disease and forensics.
The human face is made up of distinct yet related anatomic structures that together make both individuals and families recognizable. It is clear there is a strong genetic component to the human face, and though the genetics of the face have been studied for several years, there are relatively few genes known to impact normal human facial development and facial shape. We report here a large-scale human genetic study in which we successfully identify and replicate genetic markers associated with normal facial variation using advanced 3D facial imaging in African children. We identified two significant replicated genes associated with measures of human facial size, SCHIP1 and PDE8A, demonstrated their clear expression in the developing face in the mouse, and identified 10 additional candidate genetic loci for human facial shape. Gene discovery for human facial development is an important first step for both diagnosing and treating craniofacial syndromes and for developing forensic modeling of the human face.
Citation: Cole JB, Manyama M, Kimwaga E, Mathayo J, Larson JR, Liberton DK, et al. (2016) Genomewide Association Study of African Children Identifies Association of SCHIP1 and PDE8A with Facial Size and Shape. PLoS Genet 12(8): e1006174. https://doi.org/10.1371/journal.pgen.1006174
Editor: Gregory S. Barsh, Stanford University School of Medicine, UNITED STATES
Received: April 5, 2016; Accepted: June 15, 2016; Published: August 25, 2016
Copyright: © 2016 Cole et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Phenotype data were deposited in the FaceBase data Hub (FaceBase: https://www.facebase.org/; FB00000667.01). Genotype data were deposited in the Database of Genotypes and Phenotypes (dbGaP: http://www.ncbi.nlm.nih.gov/gap; phs000622.v1.p1).
Funding: This work was funded by grants from the National Institutes of Health under the NIDCR FaceBase Initiative (http://www.nidcr.nih.gov/; NIDCR DE020054, to RAS), the Center for Inherited Disease Research (http://www.cidr.jhmi.edu/; HG006829, to RAS), the National Institute of Justice (http://www.nij.gov/Pages/welcome.aspx; 2013-DN-BX-K005, to RAS), and the National Science and Engineering Council Discovery Grant (http://www.nserc-crsng.gc.ca/index_eng.asp; DG#238992-12, to BH). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The human face exhibits remarkable phenotypic variation, making it one of the most recognizable human characteristics. While facial variation is subject to environmental modifiers such as age and nutritional status, striking facial similarities within families suggest a strong genetic component , and heritability of some facial measurements is as high as 94% [2–6].
Nevertheless, candidate gene [1, 7–9] and genomewide association [10–12] studies of facial variation principally in adults have yielded few consistent results, possibly due to use of inconsistent phenotyping methods as well as confounding environmental differences such as nutritional status and age. A candidate gene study of various facial morphologic measures in a multiethnic cohort reported marginal association of SNPs in 20 genes, including SLC35D1, FGFR1, and LRP6. Genomewide association studies (GWAS) in European-derived white (EUR) adolescents [10, 11] and adults [11, 12] have reported associations of various midfacial phenotypes with SNPs in the PAX3, TP63, COL17A1, C5orf50, PRDM16, DCHS2, RUNX2, GLI3, PAX1, and EDAR regions.
Here, we describe a GWAS and two independent replication studies of facial size and shape measures in African Bantu children, a relatively homogeneous population in whom remarkably lean body mass and young age may minimize confounding environmental influences. We identified significant association of two loci with measures of facial size, and suggestive association of 10 additional loci with measures of facial shape. These results from the first GWAS of facial morphology reported for an African population, differ from those reported for EUR populations, and have potential implications for understanding facial development and facial birth defects, as well as forensic applications in modeling the human face from DNA.
The GWAS cohort included 3,505 normal African Bantu children and adolescents ages 3–21 from the Mwanza region of Tanzania. Over 70% of subjects were aged 7 through 12 (median age = 10 years; interquartile range (IQR) = 8 to 13 years), and 45% were male and 55% female (S1 Fig). This study population was selected to minimize non-genetic confounders of facial morphology; namely, age and excess subcutaneous fat. Indeed, for both the GWAS and replication cohorts (see below), BMI is considerably lower than the WHO 2007 Reference; between the ages of 5–19 approximately 98% of our study population is below the WHO overweight classification (+1SD), 86% is below the WHO median, and 13% is below the WHO thinness classification (-2SD) (Fig 1). Furthermore, Mwanza is located approximately two degrees south of the Equator, and has a remarkably constant climate, minimizing environmental climatic confounders. Detailed analyses of potential population substructure in the GWAS cohort demonstrated the absence of apparent genetic subgroups by both school and tribe (S2 Fig and S3 Fig), with median fixation index (FST) values of 0.0005 (IQR = 0.0002 to 0.0010) and 0.0020 (IQR = 0.0007 to 0.0043) for schools and tribes, respectively. Thus, this region is both environmentally and genetically relatively homogeneous.
Histogram of Tanzanian Z-scores calculated from WHO 2007 reference data by age and sex with WHO cutoffs for thinness and overweight classifications.
The two independent replication cohorts (termed MC and GM) respectively consisted of 1,140 and 1,250 African Bantu children from the same population as the GWAS. Both replication cohorts had similar distributions of age and sex as the GWAS screening population; 66% of the MC replication cohort were aged 7 through 12 (median age = 11 years; IQR = 9 to 13 years); 45% were male and 55% were female (S1 Fig). Similarly, 77% percent of the GM replication cohort were aged 7 through 12 (median age = 11 years; IQR = 9 to 12 years); 41% were male and 59% were female (S1 Fig). Our GWAS and two replication cohorts have no known or apparent differences in age range, sex representation, tribal composition, or representation of locales. The GWAS cohort and MC replication cohort were both imaged using the MC 3D camera system. After failure of the MC camera in the field, the GM replication cohort was imaged using the very similar GM 3D camera system. Although the MC and GM camera systems yielded very similar data, to minimize potential bias the MC and GM replication cohorts were analyzed separately, and the test statistics combined by meta-analysis.
Quantitative phenotypes were derived from the MC and GM 3D facial scans based on 29 standard facial morphometric landmarks (Fig 2). These quantitative phenotypes were of three general types (Table 1), including three global measures of facial size, 25 inter-landmark linear distances, five summary variables from a principal components analysis (PCA) of the whole face (explaining approximately 70% of total facial variation) (S4 Fig), and one summary variable from a PCA of the most highly correlated mid-facial landmarks (explaining approximately 40% of total midface variation) (S5 Fig). We then used a specialized statistical approach for quantitative association testing in our GWAS cohort to account for occult relatedness in the sample (see Material and Methods). Following GWAS and replication study analyses, we performed a meta-analysis to combine statistics of the GWAS and two replication datasets.
3D facial mesh obtained for each study individual with study landmarks obtained from automatic landmarking overlaid and labeled.
Two loci were associated with distinct measures of global facial size, both associations surpassing a genomewide significance criterion for an African population (P < 2.50 x 10−8) , yielding independent replication (P < 0.05) with no significant inter-study heterogeneity (I2 < 50%), and surpassing a stringent genomewide meta-analysis significance threshold corrected for number of effectively independent phenotypes tested (P < 2.50 x 10−8 / 9 = 2.78 x 10−9). The first of these associations was of SNPs in the SCHIP1 region of chromosome 3q25.33 (chr3:159,774,689–159,960,389; Fig 3A and S6 Fig) with centroid size, a measure of overall facial size that is uncorrelated with variables of shape  (Fig 3B). Twenty SNPs in the SCHIP1 region were associated with centroid size in the GWAS. The lead SNP was rs79909949 (GWAS P = 9.56 x 10−9; replication P = 1.80 x 10−3; meta-analysis P = 6.58 x 10−11), located within SCHIP1, 500 kb downstream of the 5’ transcriptional start for SCHIP1 transcripts 1, 2, and 3 and 65 kb upstream of the unique 5’ transcriptional start for SCHIP1 transcript 4, which contains an alternative first exon. SNP rs79909949 is within a striking ENCODE  predicted transcriptional regulatory element (chr3:159,491,500–159,495,250) that, as predicted by HaploReg v4.1 (http://www.broadinstitute.org/mammals/haploreg/haploreg.php), exhibits features of an enhancer in many different cell types including an open hypomethylated chromatin configuration, multiple DNase I hypersensitivity sites, and numerous RNA polymerase II and transcription factor binding sites. HaploReg v4.1 predicts that SNP rs79909949 alter bindings motifs of at least three different transcription factors.
(A) Regional association plot of centroid size at the SCHIP1 locus. Association data are shown using GWAS P-values with the meta-analysis P-value for the lead SNP, rs79909949. The LD pattern is based on the 1000 Genomes Project 2012 African reference and GRCh37/hg19. The estimated recombination rate (cM/Mb) is from HapMap samples. (B) Relative facial size at the upper and lower 95% confidence intervals for centroid size after adjusting for sex and age.
Furthermore, 35 additional SNPs in a largely overlapping region within SCHIP1 (chr3:159,908,007–160,103,634; Fig 4A and S6 Fig) were associated with PC4, representing both facial height and nasal width (Fig 4B), with P-values as low as 7.92 x 10−7 (rs368386044). Conditional analysis showed that the PC4 association was independent of the centroid size association; conditioning the lead PC4 SNP rs368386044 on the lead centroid size SNP rs79909949 did not affect significance (after conditioning P = 7.95 x 10−7). Thus, rs79909949 is associated with overall facial size while rs63740860 is independently associated with facial height and nasal width.
(A) Regional association plot of PC4 at the SCHIP1 locus. Association data are shown using GWAS P-values. The most associated SNP rs368386044 could not be displayed in the LocusZoom plot, but is in complete linkage disequilibrium with rs9868698. See Fig 3 legend for details. (B) Morphs showing the range of shape variation along PC4. The heatmap depicts the regions of the face that vary the most between the min and max morphs. Red shows the regions that project most beyond the mean mesh at the positive extreme while yellow is intermediate in that direction. Blue shows the areas that project most inwards from the mean mesh while light blue shows a lesser degree of inwards projection. Green shows those regions that align most closely to the mean mesh.
The second association was of SNPs in the PDE8A region of chromosome 15q25.3 (chr15: 84,923,649–85,161,983; Fig 5A and S6 Fig) with the allometry variable, which represents a complex scaling relationship between size and shape (Fig 5B). One hundred thirty-six SNPs in the PDE8A region were associated with the allometry variable in the GWAS. The lead SNP, rs12909111 (GWAS P = 2.36 x 10−7; replication P = 1.53 x 10−3; meta-analysis P = 2.52 x 10−9), is located within intron 1 of PDE8A transcripts 2, 3, and 4 and intron 2 of PDE8A transcript 1. We also replicated association of a second SNP within PDE8A, rs12908400 (GWAS P = 1.92 x 10−7; replication P = 1.03 x 10−3; meta-analysis P = 2.36 x 10−8), in strong linkage disequilibrium (D' = .91, r2 = .81) with rs12909111. SNP rs12908400 is located 31 kb upstream of rs12909111, within a broad ENCODE predicted transcriptional element observed in endothelial cells, and HaploReg v4.1 predicts that rs12908400 overlaps an enhancer active in many different tissue types, alters 10 transcription factor binding motifs, and overlaps 4 apparent eQTL tissue associations.
(A) Regional association plot of the allometry variable at the PDE8A locus. Association data are shown using GWAS P-values with the meta-analysis P-values for the two lead SNPs, rs12909111 and rs12908400. See Fig 3 legend for details. (B) Morphs showing the range of allometric variation in facial shape. See Fig 4 legend for heatmap details.
Furthermore, 49 of the PDE8A region SNPs associated with the allometry variable also showed marginal association with other facial phenotypes. Forty-one of these SNPs were associated with inner canthal distance EN_R_EN_L, and eight were associated with PC2, representing overall facial shape and upper facial width. Conditional analysis showed that the EN_R_EN_L association was completely dependent on the allometry association; conditioning on the lead allometry SNP rs12909111 abolished significance for the lead EN_R_EN_L SNP rs57482637 (before conditioning P = 2.60 x 10−5; after conditioning P = .059). However, conditional analysis showed that the PC2 association was partially independent of allometry; conditioning on the lead allometry SNP rs12909111 did not abolish significance for the lead PC2 SNP rs141832194 (before conditioning P = 8.75 x 10−6; after conditioning P = 3.42 x 10−4). Thus, rs12909111 is associated with the allometry facial size and shape variable while rs63740860 may be independently associated with overall facial shape and upper facial width.
In addition to these two confirmed associations, 11 additional SNPs in ten loci showed suggestive association, defined by nominal GWAS association (P < 1.0 x 10−5), nominal replication (P < 0.05), marginal meta-analysis significance (P < 1.0 x 10−6), and no significant inter-study heterogeneity (I2 < 50%) (Table 2). Two different SNPs in complete linkage disequilibrium at the TFAP2B locus at 6p12.3 were associated with two different highly correlated eye measurement phenotypes (r2 = .75), EX_R_EX_L, representing outer canthal width, and EN_EX, representing average palpebral fissure length. SNP rs2817419, within the TFAP2B 3' untranslated region (UTR) was associated with both outer canthal width (meta-analysis P = 7.28 x 10−8) and average palpebral fissure length (GWAS P = 5.69 x 10−6). SNP rs35965172, located 2 kb downstream of the TFAP2B 3' UTR was associated with average palpebral fissure length (meta-analysis P = 4.82 x 10−7) and outer canthal width (GWAS P = 9.41 x 10−6). Conditional analysis demonstrated complete dependence of these two SNPs with outer canthal width and average palpebral fissure length; conditioning on the lead SNP rs2817419 abolished significance for SNP rs35965172 with average palpebral fissure length (before conditioning GWAS P = 3.57 x 10−6; after conditioning P = .1574). TFAP2B encodes AP-2ß, a key transcription factor for craniofacial development, and heterozygous missense mutations in TFAP2B cause Char Syndrome, which includes dysmorphic facial features such as hypertelorism, downward slanting palpebral fissures, flattened and broad nose, short philtrum, and triangular mouth with prominent upper lips [16, 17]. TFAP2B is thus a strong biological candidate for involvement in facial shape variation, particularly involving the region around the eyes.
SCHIP1 and PDE8A have not previously been implicated in facial morphogenesis. To assess their roles, we assayed expression of Schip1 and Pde8a during mouse development. Quantitation of gene expression in microdissected murine facial tissues at E9.5-E12.5, a critical period of mouse facial development, by both expression microarray  and RNAseq analyses showed that both Schip1 and Pde8a are differentially expressed in the developing mouse face. As shown in Fig 6, in whole-mount embryos Schip1 is expressed in multiple tissues, including the developing face; specifically, the nasal processes, maxillary process, and mandibular process. Expression appears maximal at E10.5 and 11.5, and then declines (Fig 6A–6D). In addition, RNAseq analysis of Schip1 demonstrated high expression in the ectoderm of the nasal, maxilla, and mandibular prominences at E11.5 and E12.5, with somewhat less expression in the mesenchyme (S7 Fig). Similarly, in whole-mount embryos Pde8a was expressed principally in the face, maximal at E9.5 (Fig 6E–6H). RNAseq analysis of Pde8a demonstrated that expression occurs primarily in the mesenchyme of all facial prominences, with little, if any, ectodermal expression during these same periods of critical development (S7 Fig).
Whole-mount in situ hybridization of (A-D) Schip1 and (E-H) Pde8a expression in mouse embryos from E9.5 to E12.5. ba1, first branchial arch (future mandible); ba2, second branchial arch; fb, forebrain; fn, frontonasal process; fl, forelimb; hb, hindbrain; hl, hindlimb; ln, lateronasal process; mb, midbrain; md, mandible; mn, medionasal process; mx, maxilla; ov, otic vesicle.
The genetic underpinnings of facial shape is a topic of considerable scientific, forensic, and popular interest, and familial aspects of a newborn’s face are one of the first characteristics noted by grandparents around the world. The strong facial resemblance among close relatives, and the high heritabilities of many facial measurements [2–6], underscore a major genetic component underlying facial shape. It is clear that aging and environmental factors such as nutrition also play large roles in facial shape. Nevertheless, the intriguing ability of so-called “super-recognizers” to identify individual faces regardless of environmental variables  suggests that each human face has its own intrinsic shape, remarkably constant and almost certainly genetically determined. The amazing ability of Polistes fuscatus paper wasps to similarly use facial recognition and learning to distinguish between friend and foe  indicates a remarkable phenomenon that transcends beyond the evolution of the human species.
Our findings demonstrate that both SCHIP1 and PDE8A are associated with measures of human facial size, and both are expressed in the developing face in the mouse. SCHIP1 was first identified as a major protein that interacts with schwannomin, the neurofibromatosis type 2 tumor suppressor protein . Initial characterization of SCHIP1 cDNA and mRNA indicated that it is a coiled-coil protein expressed principally in the brain, skeletal muscles, and heart and to a lesser extent in the pancreas, kidney, liver, lung, and placenta; however, bone, neural crest derivatives, and other tissues relevant to the face were not analyzed . Subsequent studies showed that Schip1 knockout mice exhibit developmental anomalies of the neural-crest derived skeleton, the palatal processes, and the snout , indicating that it also plays a role in facial development. SCHIP1 functions as an early response gene in the PDGF signaling cascade, promoting cell migration in response to PDGF signaling through actin cytoskeleton rearrangements , though its specific function in the developing face remains unknown.
PDE8A was first identified as a novel cyclic nucleotide phosphodiesterase via a bioinformatic screen based on PDE family homology . This study demonstrated predominant expression in the testis and ovary, with minimal expression in several other tissues, but again, did not assess expression in bone or other relevant craniofacial tissues . PDE8A hydrolyzes cAMP, regulates Ca2+ movement through cardiomyocytes , and stimulates testosterone production in Leydig cells . The function of PDE8A in the developing face is not known.
This is one of the first studies to demonstrate replicated genomewide genetic associations of human facial morphometric phenotypes in humans, and is the first reported in an African population. We did not replicate previous reported facial shape associations [10–12], from studies of EUR populations. It is possible that facial morphology differences in different human populations have different genetic underpinnings. Alternatively, as noted above, our study cohort was young and almost universally lean, and therefore may be less influenced by environmental factors than study cohorts of adults from EUR populations. Aside from an interesting overlap in TFAP2B which is known to cause Char syndrome, the majority of our newly identified genetic loci have not been previously implicated in human facial development, facial dysmorphic syndromes, or animal mutant models. Our findings provide a basis for detailed analyses of the functions of these genes in the developing face, and their roles in determining the normal facial variation that make us both individually different and individually recognizable.
Materials and Methods
Sample and data collection for both the GWAS screening cohort and two replication cohorts of Bantu African children was undertaken under the NIDCR FaceBase1 initiative, over a three-year period in the Mwanza region of Tanzania. The GWAS screening cohort included 3,631 subjects, all imaged using the Creaform MegaCapturor (MC) camera three-dimensional (3D) photogrammetry imaging system. The first replication cohort included 1,173 subjects also imaged using the MC 3D system, and the second replication cohort included 1,506 subjects imaged using the Creaform Gemini (GM) 3D imaging system. All subjects were apparently unrelated, aged 3–21. Individuals with a known birth defect or a relative with a known orofacial cleft were excluded. Additional data collected from each subject included age, sex, height, weight, school, and detailed parental and grandparental ethnicity and tribe information. Tanzanian BMI Z-scores were calculated from the 2007 World Health Organization (WHO) growth reference 5–19 year old distribution  for all individuals with non-missing height and weight. Written informed consent was obtained for all study subjects or their parents, as appropriate. This study was carried out with overall approval and oversight of the Colorado Multiple Institutional Review Board (protocol #09–0731), was additionally approved by the institutional review boards of the University of Calgary, Florida State University, the University of California San Francisco, and the Catholic University of Health and Allied Sciences (Mwanza, Tanzania), and was carried out with the approval of the National Institute for Medical Research (Tanzania). Written informed consent was obtained from all study subjects or their parents, as appropriate.
Derivation of Phenotypic Variables
Each subject was imaged twice at six angles. Individual images were assembled at the highest possible resolution into a single 3D mesh composite of the face using InSpeck FAPS and EM 6.0 software. Twenty-nine 3D landmarks were placed on the meshes using a novel automated landmarking method. A manuscript describing the automated landmarking method has been submitted for publication. Details of the method and the automated landmarking algorithm are available at https://www.facebase.org/facial_landmarking/ , and landmarks were then subjected to Procrustes superimposition for morphometric analysis [28–30]. Images from 163 subjects in the GWAS cohort were landmarked manually as they could not be landmarked automatically. This was mostly due to imaging artifacts on non-critical regions of the face that do not interfere with manual landmark placement. Superficial artifacts (smiling, squinting, open mouth, etc.) in the landmark data were corrected using a multiple linear regression in which all factors and their interactions were considered. The resulting residuals were mean centered and used for downstream phenotype derivation.
A 3D "skew" artifact (coordinated asymmetric displacement of landmarks due to image assembly) was identified by principal components analysis (PCA) of the landmark coordinates and was removed by regressing out the PC scores from the landmark data. Unlike manual landmarking, automated landmarking produces a non-normal distribution of measurement error; therefore, outliers were detected using the combination of Procrustes distance from the mean and the within-landmark variance of distances from each landmark mean. The cleaned landmark data were then used to calculate linear distances and multivariate measures to be used as phenotypes. Linear distance phenotypes were calculated as the distance between their defining landmarks, multiplied by the centroid size of each individual's landmark configuration. Centroid size, by definition, is the mean squared distance of each landmark from the geometric center. Allometry is shape variation related to size [31,32], and was calculated using regression scores corresponding to size independent of age. To calculate multivariate measures, we regressed out age and size variation in symmetrized landmark data, and the first five PCs of a PCA were used as phenotypes. S1 File shows 3D faces that correspond to the extremes of variation in allometry and the first 5PCs. To calculate the MidfaceModPC1 phenotype we used the RV method to identify the set of spatially contiguous landmarks that maximized the ratio of covariation among themselves to covariation with landmarks outside of that set . The resulting set around the midface was then subjected to PCA, and the first PC represented MidfaceModPC1. For all variables, measurements greater than four standard deviations from the mean were excluded from analyses. All morphometric analyses were performed in MorphoJ  or in R using the Geomorph  and Morpho  packages. Head circumference was measured directly using a tape measure on a subset of 2,676 subjects. Phenotype data were deposited in the FaceBase data Hub (FaceBase: https://www.facebase.org/; FB00000667.01).
GWAS Genotyping and QC
Subject saliva specimens were obtained using Oragene DNA self-collection kits (DNA Genotek), and genomic DNA was prepared from saliva specimens per the manufacturer's instructions or using the Maxwell™ robotic platform and Maxwell™ 16 Blood DNA Purification Kit (Promega). For genome-wide genotyping, DNA concentrations were assayed by fluorescence with the Qubit dsDNA BR Assay Kit (Life Technologies). Genome-wide genotyping was performed at the Center for Inherited Disease Research (CIDR) using the Illumina HumanOmni2.5Exome-8v1_A array, interrogating 2,567,845 variants. Genotypes were called using GenomeStudio ver. 2011.1, genotyping module 1.9.4 and GenTrain version 1.0. Quality control filtering of genome-wide genotype data was performed by the University of Washington Genetics Coordinating Center (UWGCC), as described elsewhere . A total of 3,631 samples were genotyped successfully. Subjects were excluded on the basis of SNP call rates < 97% (n = 0), discordance between reported and genotyped sex (n = 0), XXY karyotypes (n = 2), non-Bantu heritage (n = 33), missing covariates (n = 2), and/or inadvertent subject duplication (n = 74). The final GWAS included 3,505 individuals after phenotype QC. Quality control analysis discovered considerable cryptic relatedness among our study population at the full or half-sibling level. This included 563 families of at least two members each, with all pairs of subjects connected by a kinship coefficient (KC) greater than the lower limit of the 95% prediction interval for half-siblings (coefficient > 0.098). As described below, we used a specialized statistical approach to account for this high level of relatedness in our association testing.
To more closely assess the genetic substructure within our population we performed a PCA on our unrelated cohort of 2,720 GWAS individuals, defined by KC less than the lower limit of the 95% prediction interval for half-siblings (KC < 0.098) using LD-pruned common markers genomewide, and color-coded PCs by both school and tribe (S2 Fig). Furthermore, we estimated FST between schools and tribes of all unrelated study subjects to determine if these defined subgroups identify genetically distinct sub-populations. We estimated FST within our unrelated cohort using LD-pruned common markers on chromosome 22. The estimates of FST by school were made using all unrelated individuals (n = 2,720) and the estimates of FST by tribe were made using all unrelated individuals with the same maternal and paternal tribe affiliations (n = 2,257). The distributions of pairwise FST estimates by school and tribe are depicted in S3 Fig. We limited this analysis to subgroups with n ≥ 5, as very small sample size has been shown to inflate FST estimates . All pairwise estimates were < 0.05, supporting a high level of genetic homogeneity among our study subjects .
SNPs were excluded on the basis of call rates < 2% (n = 40,146), MAF < 1% (n = 602,288), positional duplicates (n = 39,847), CIDR technical filters (n = 58,825), > 1 discordant call in planned study duplicates (n = 900), > 1 Mendelian error in seven HapMap trios (n = 436), significant deviation (P < 10−4) from Hardy-Weinberg equilibrium (HWE; n = 7,856), and sex difference in allele frequency ≥ 0.2 (n = 199) leaving a total of 1,817,348 genotyped SNPs passing QC filters. The UWGCC performed genome-wide imputation of non-genotyped markers to 1000GenomesProject data using SHAPEIT2 and IMPUTE2 software for phasing and imputing probabilistic genotypes using a worldwide reference panel of all samples from The 1000 Genomes Project phase 1 integrated variant set . Genotype data were deposited in the Database of Genotypes and Phenotypes (dbGaP: http://www.ncbi.nlm.nih.gov/gap; phs000622.v1.p1).
Replication Genotyping and QC
DNA of the two replication study cohorts, collected as described above for the GWAS, was genotyped at CIDR using a custom 384 SNP Illumina GoldenGate assay per the manufacturer's instructions. From the two replication cohorts a total of 2,679 samples were genotyped successfully. Using the break in the individual genotype call rate distribution as a guideline, subjects with missing genotype rate > .04 (n = 7) or discordance between reported and observed sex (n = 12) were removed. Further individuals were removed due to poor image quality, duplicate collection, and landmarking outliers. The final replication dataset included 2,390 individuals. 231 SNPs plus five internal controls were genotyped successfully with call rate > 97.5%. SNPs that failed the Hardy-Weinberg equilibrium (HWE) test with a P-value Bonferroni-corrected for the number of independent signals to be tested (P < 2.52 x 10−4; 0.05/198) were excluded (n = 4), leaving a total of 227 SNPs that represented 194 independent loci.
We tested for association between each SNP and 34 phenotypic traits using the Efficient Mixed-Model Association eXpedited (EMMAX) method , implemented in the web-based BC Platforms data management software (http://bcplatforms.com/). A linear mixed-model with an additive SNP effect was used to account for both population stratification and a large amount of occult relatedness among our subjects. The specific phenotypes analyzed for genetic association included the first five PCs from a PCA of the whole face, the first PC from a PCA of the midface, 25 inter-landmark linear distances, centroid size, allometry, and head circumference (Table 1). All multivariate measures and linear distances were corrected for age, sex, and size at either the phenotype derivation or genetic association testing stages. In particular, multivariate measures, including all PCs and allometry, were adjusted for sex in the genetic model; centroid size was adjusted for age and sex; and linear distances were adjusted for age, sex, and centroid size (following adjustment for age and sex). The GWAS was performed on the full imputed dataset of approximately 15,815,000 markers with MAF > 0.01 and info quality score > 0.30. All genomic coordinate positions reported in text are based on build GRCh38.
We filtered association results by P < 1.0 x 10−4 and prioritized 379 SNPs (plus 5 internal controls) for replication genotyping on the basis of P-value, minor allele frequency, number of associated traits, number of suggestive SNPs within a locus, size of the locus, gene annotation, and replication assay design score. For the two replication studies, in which we do not have genomewide genotype data from which to derive information on population substructure, we used PLINK  linear regression to test genetic association of each of the 227 replication study SNPs that passed quality control filters, testing only the trait with the most significant association in the GWAS for any given replication SNP. We then conducted a meta-analysis of the two replication studies as well as a meta-analysis of the GWAS and the two replication studies using METAL . SNPs were eliminated from downstream meta-analysis due to effect size heterogeneity among the three studies, as indicated by an I2 statistic > 50% . Our remaining 24 SNP summary statistics were combined using the inverse variance fixed effects method to determine meta-analysis effect sizes and P-values. A PCA was performed on the 34 phenotype residuals in our unrelated African GWAS sample set to determine the number of effectively independent phenotypes for a Bonferroni correction of our meta-analysis significance threshold. The first 9 eigenvectors had eigenvalues > 1 indicating the variance explained by the first 9 eigenvectors was representative of at least one phenotype and therefore was used in our multiple testing correction. The meta-analysis genomewide significance threshold was calculated using this Bonferroni correction on the GWAS significance threshold for an African population (2.5 x 10−8 / 9 = P < 2.78 x 10−9) . Conditional analysis were performed by assessing secondary SNP significance before and after the inclusion of the lead SNP as an additional covariate in the model. Regional association plots in Figs 3, 4 and 5 were constructed using meta-analysis P-values for the 3 replicated SNPs and GWAS P-values for all other SNPs within the locus of interest using LocusZoom .
Whole Mount In Situ Hybridization
Digoxygenin-labeled RNA probes (DIG RNA labeling kit; Roche, Indianapolis, IN) for murine Pde8a and Schip1 were generated by in vitro transcription from PCR-derived templates. Wild-type C57BL/6J mice were mated overnight, and the presence of a vaginal plug was taken to indicate embryonic day (E) 0.5. Whole-mount in situ hybridization was performed according to standard protocols on mouse embryos harvested from E9.5 to E12.5 time periods and fixed in 4% PFA.
For RNAseq analysis, ectoderm and mesenchyme from each paired facial prominence (frontonasal, maxillary, mandibular) of C57BL/6J mice (Jax Labs, Bar Harbor, ME) were collected at E10.5, E11.5 and E12.5 as described previously . RNA was extracted by using Norgen long RNA and microRNA separation kits (Norgen Biotek, Thorold, ON) according to manufacturer’s instructions. RNA-seq libraries were constructed using Illumina TruSeq Stranded mRNA Sample Preparation kits (Illumina, San Diego, CA), and paired-end reads for 125 cycle of sequencing were performed on an Illumina HiSEQ 2500 by the University of Colorado Genomics and Microarray Core. RNA reads were aligned to the mouse genome (MM10) by gSNAP , expression (FPKM) derived by Cufflinks , and differential expression [49–52] analyzed with ANOVA in R.
S1 Fig. Age by sex distribution for GWAS, Megacapturor Replication, and Gemini Replication.
Age by sex distribution for (A) GWAS, n = 3,505; (B) Megacapturor Replication, n = 1,140; and (C) Gemini Replication, n = 1,250.
S2 Fig. Principal components analysis of LD-pruned genomewide markers in unrelated GWAS individuals.
Scree plot of the percent of total variance explained by the top 50 PCs (A). PCA cluster plots of the top 4 PCs colored by school (B) and tribe (C) demonstrate minimal genetic substructure among the top 4 PCs which explain only 0.36% of the total variance.
S3 Fig. Distributions of pairwise Fst estimates by school and tribe from LD-pruned markers in unrelated GWAS individuals.
Distribution of pairwise Fst estimates by school (A) and tribe (B) demonstrate minimal genetic differentiation among subgroups.
S4 Fig. 3D facial morphs and heat maps of principal components 1–5 (PC1-PC5).
Facial morphs and heat maps depict changes associated with positive and negative PC scores for each of the 5 PCs tested for genetic association. See Table 1 for brief descriptions of PC trends.
S5 Fig. Midfacial module landmark configuration.
Determination of the midfacial module using the RV coefficient. (A) The landmarks used in the midfacial module (yellow). (B) The total connections among landmarks tested. (C) The distribution of the RV coefficient for random subsets of landmarks. The red arrow shows the location of the selected subset from that distribution. (D) The shape change that corresponds to the first principal component of the midfacial module.
S6 Fig. GWAS Manhattan plots of top signals.
GWAS Manhattan plots of (A) centroid size, (B) PC4, (C) and allometry.
S7 Fig. Pde8a and Schip1 RNA expression analysis during mouse embryonic facial development.
The graphs show RPKM values (reads per kilobase of transcript per million reads mapped) for Pde8a (top) and Schip1 (bottom) derived from RNAseq experiments. Time-course data are shown for the frontonasal prominence (FNP), maxillary prominence (MXP), or mandibular prominence (MNP) at embryonic day (E) 10.5, 11.5 and 12.5 for either the ectoderm (blue) or mesenchyme (red) component. RNAseq experiments were run on independent biological triplicates with error bars showing standard deviation.
- Conceived and designed the experiments: MM WM TW BH SAS RAS.
- Performed the experiments: JBC MM EK JM JRL DKL KL TMF SLR ML MP HL KLJ.
- Analyzed the data: JBC DKL ML MP KLJ.
- Contributed reagents/materials/analysis tools: ML WM.
- Wrote the paper: JBC SAS BH RAS ODK.
- 1. Boehringer S, van der Lijn F, Liu F, Gunther M, Sinigerova S, Nowak S, et al. Genetic determination of human facial morphology: Links between cleft-lips and normal variation. Eur J Hum Genet. 2011;19(11):1192–7. pmid:21694738
- 2. AlKhudhairi TD, AlKofide EA. Cephalometric craniofacial features in Saudi parents and their offspring. Angle Orthod. 2010;80(6):1010–7. pmid:20677948
- 3. Amini F, Borzabadi-Farahani A. Heritability of dental and skeletal cephalometric variables in monozygous and dizygous iranian twins. Orthod Waves. 2009;68(2):72–9.
- 4. Carson EA. Maximum likelihood estimation of human craniometric heritabilities. Am J Phys Anthropol. 2006;131(2):169–80. pmid:16552732
- 5. Johannsdottir B, Thorarinsson F, Thordarson A, Magnusson TE. Heritability of craniofacial characteristics between parents and offspring estimated from lateral cephalograms. Am J Orthod Dentofacial Orthop. 2005;127(2):200–7. pmid:15750539
- 6. Manfredi C, Martina R, Grossi GB, Giuliani M. Heritability of 39 orthodontic cephalometric parameters on MZ, DZ twins and MN-paired singletons. Am J Orthod Dentofacial Orthop. 1997;111(1):44–51. pmid:9009923
- 7. Claes P, Liberton DK, Daniels K, Rosana KM, Quillen EE, Pearson LN, et al. Modeling 3D facial shape from DNA. PLoS Genet. 2014;10(3):e1004224. pmid:24651127
- 8. Moreno Uribe L, Ray A, Blanchette D, Dawson D, Southard T. Phenotype–genotype correlations of facial width and height proportions in patients with class ii malocclusion. Orthod Craniofac Res. 2015;18(S1):100–8.
- 9. Peng S, Tan J, Hu S, Zhou H, Guo J, Jin L, et al. Detecting genetic association of common human facial morphological variation using high density 3D image registration. PLoS Comput Biol. 2013;9(12):e1003375. pmid:24339768
- 10. Paternoster L, Zhurov Alexei I, Toma Arshed M, Kemp John P, St. Pourcain B, Timpson Nicholas J, et al. Genome-wide association study of three-dimensional facial morphology identifies a variant in PAX3 associated with nasion position. Am J Hum Genet. 2012;90(3):478–85. pmid:22341974
- 11. Liu F, van der Lijn F, Schurmann C, Zhu G, Chakravarty MM, Hysi PG, et al. A genome-wide association study identifies five loci influencing facial morphology in Europeans. PLoS Genet. 2012;8(9):e1002932. pmid:23028347
- 12. Adhikari K, Fuentes-Guajardo M, Quinto-Sánchez M, Mendoza-Revilla J, Chacón-Duque JC, Acuña-Alonzo V, et al. A genome-wide association scan implicates DCHS2, RUNX2, GLI3, PAX1 and EDAR in human facial variation. Nature communications. 2016;7.
- 13. Pe'er I, Yelensky R, Altshuler D, Daly MJ. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol. 2008;32(4):381–5. pmid:18348202
- 14. Bookstein FL. Size and shape spaces for landmark data in two dimensions. Stat Sci. 1986:181–222.
- 15. Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A, Marinov GK, et al. Defining functional DNA elements in the human genome. Proc Natl Acad Sci U S A. 2014;111(17):6131–8. pmid:24753594
- 16. Satoda M, Zhao F, Diaz GA, Burn J, Goodship J, Davidson HR, et al. Mutations in TFAP2B cause char syndrome, a familial form of patent ductus arteriosus. Nat Genet. [Article]. 2000;25(1):42–6. pmid:10802654
- 17. Zhao F, Weismann CG, Satoda M, Pierpont MEM, Sweeney E, Thompson EM, et al. Novel TFAP2B mutations that cause char syndrome provide a genotype-phenotype correlation. Am J Hum Genet. 2001;69(4):695–703. pmid:11505339
- 18. Feng W, Leach SM, Tipney H, Phang T, Geraci M, Spritz RA, et al. Spatial and temporal analysis of gene expression during growth and fusion of the mouse facial prominences. PLoS ONE. 2009;4(12):e8066. pmid:20016822
- 19. Russell R, Duchaine B, Nakayama K. Super-recognizers: People with extraordinary face recognition ability. Pyschon Bull Rev. 2009;16(2):252–7.
- 20. Sheehan MJ, Tibbetts EA. Specialized face learning is associated with individual recognition in paper wasps. Science. 2011;334(6060):1272–5. pmid:22144625
- 21. Goutebroze L, Brault E, Muchardt C, Camonis J, Thomas G. Cloning and characterization of SCHIP-1, a novel protein interacting specifically with spliced isoforms and naturally occurring mutant NF2 proteins. Mol Cell Biol. 2000;20(5):1699–712. pmid:10669747
- 22. Schmahl J, Raymond CS, Soriano P. PDGF signaling specificity is mediated through multiple immediate early genes. Nat Genet. 2007;39(1):52–60. pmid:17143286
- 23. Perisic L, Rodriguez PQ, Hultenby K, Sun Y, Lal M, Betsholtz C, et al. Schip1 is a novel podocyte foot process protein that mediates actin cytoskeleton rearrangements and forms a complex with nherf2 and ezrin. PLoS ONE. 2015;10(3):e0122067. pmid:25807495
- 24. Fisher DA, Smith JF, Pillar JS, Denis SHS, Cheng JB. Isolation and characterization of PDE8A, a novel human camp-specific phosphodiesterase. Biochem Biophys Res Commun. 1998;246(3):570–7. pmid:9618252
- 25. Patrucco E, Albergine MS, Santana LF, Beavo JA. Phosphodiesterase 8a (PDE8A) regulates excitation–contraction coupling in ventricular myocytes. J Mol Cell Cardiol. 2010;49(2):330–3. pmid:20353794
- 26. Vasta V, Shimizu-Albergine M, Beavo JA. Modulation of leydig cell function by cyclic nucleotide phosphodiesterase 8a. Proc Natl Acad Sci U S A. 2006;103(52):19925–30. pmid:17172443
- 27. WHO Multicentre Growth Reference Study Group. Who child growth standards: Length/height-for-age, weight-for-age, weight-for-length, weight-for-height and body mass index-for-age: Methods and development. Geneva: World Health Organization. 2006:pp 312.
- 28. Mitteroecker P, Gunz P. Advances in geometric morphometrics. Evol Biol. 2009;36(2):235–47.
- 29. Bookstein FL. Morphometric tools for landmark data. Cambridge: Cambridge University Press; 1991.
- 30. Dryden IL, Mardia KV. Statistical shape analysis. Chichester: John Wiley & Sons; 1998.
- 31. Klingenberg CP. Heterochrony and allometry: The analysis of evolutionary change in ontogeny. Biol Rev Camb Philos Soc. 1998;73(01):79–123.
- 32. Klingenberg CP, Zimmermann M. Static, ontogenetic, and evolutionary allometry: A multivariate comparison in nine species of water striders. Am Nat. 1992:601–20.
- 33. Klingenberg CP. Morphometric integration and modularity in configurations of landmarks: Tools for evaluating a priori hypotheses. Evol Dev. 2009 Jul-Aug;11(4):405–21. pmid:19601974
- 34. Klingenberg CP. Morphoj: An integrated software package for geometric morphometrics. Mol Ecol Resour. 2011;11(2):353–7. pmid:21429143
- 35. Adams D, Otárola-Castillo E, Sherratt E. Geomorph: Software for geometric morphometric analyses. R package version 2. 2014(1).
- 36. Schlager S. Morpho: Calculations and visualizations related to geometric morphometrics. R package version 023 3. 2013.
- 37. Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ, Bhangale T, et al. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet Epidemiol. 2010;34(6):591–602. pmid:20718045
- 38. Willing E-M, Dreyer C, Van Oosterhout C. Estimates of genetic differentiation measured by FST do not necessarily require large sample sizes when using many SNP markers. PLoS ONE. 2012;7(8):e42649. pmid:22905157
- 39. Hartl DL, Clark AG. Principles of population genetics: Sinauer associates Sunderland; 1997.
- 40. Consortium GP. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65. pmid:23128226
- 41. Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S-y, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. . 2010;42(4):348–54. pmid:20208533
- 42. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. pmid:17701901
- 43. Willer CJ, Li Y, Abecasis GR. Metal: Fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010 September 1, 2010;26(17):2190–1. pmid:20616382
- 44. Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003 2003-09-04 22:55:26;327(7414):557–60. pmid:12958120
- 45. Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, et al. Locuszoom: Regional visualization of genome-wide association scan results. Bioinformatics. 2010;26(18):2336–7. pmid:20634204
- 46. Li H, Williams T. Separation of mouse embryonic facial ectoderm and mesenchyme. J Vis Exp. 2013(74).
- 47. Wu TD, Nacu S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 2010;26(7):873–81. pmid:20147302
- 48. Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol. 2013;31(1):46–53. pmid:23222703
- 49. Henderson HH, Timberlake KB, Austin ZA, Badani H, Sanford B, Tremblay K, et al. Occupancy of RNA polymerase ii (s5p) and RNA polymerase ii (s2p) on VZV genes 9, 51 and 66 is independent of transcript abundance and polymerase location within the gene. J Virol. 2015:JVI.02617-15.
- 50. Bradford AP, Jones K, Kechris K, Chosich J, Montague M, Warren WC, et al. Joint MiRNA/mRNA expression profiling reveals changes consistent with development of dysfunctional corpus luteum after weight gain. PLoS ONE. 2015;10(8):e0135163. pmid:26258540
- 51. Maycotte P, Jones KL, Goodall ML, Thorburn J, Thorburn A. Autophagy supports breast cancer stem cell maintenance by regulating IL6 secretion. Mol Cancer Res. 2015;13(4):651–8. pmid:25573951
- 52. Baird NL, Bowlin JL, Cohrs RJ, Gilden D, Jones KL. Comparison of varicella-zoster virus RNA sequences in human neurons and fibroblasts. J Virol. 2014;88(10):5877–80. pmid:24600007