Figures
Abstract
Facial morphology is highly variable, both within and among human populations, and a sizable portion of this variation is attributable to genetics. Previous genome scans have revealed more than 100 genetic loci associated with different aspects of normal-range facial variation. Most of these loci have been detected in Europeans, with few studies focusing on other ancestral groups. Consequently, the degree to which facial traits share a common genetic basis across diverse sets of humans remains largely unknown. We therefore investigated the genetic basis of facial morphology in an East African cohort. We applied an open-ended data-driven phenotyping approach to a sample of 2,595 3D facial images collected on Tanzanian children. This approach segments the face into hierarchically arranged, multivariate features that capture the shape variation after adjusting for age, sex, height, weight, facial size and population stratification. Genome scans of these multivariate shape phenotypes revealed significant (p < 2.5 × 10−8) signals at 20 loci, which were enriched for active chromatin elements in human cranial neural crest cells and embryonic craniofacial tissue, consistent with an early developmental origin of the facial variation. Two of these associations were in highly conserved regions showing craniofacial-specific enhancer activity during embryological development (5q31.1 and 12q21.31). Six of the 20 loci surpassed a stricter threshold accounting for multiple phenotypes with study-wide significance (p < 6.25 × 10−10). Cross-population comparisons indicated 10 association signals were shared with Europeans (seven sharing the same associated SNP), and facilitated fine-mapping of causal variants at previously reported loci. Taken together, these results may point to both shared and population-specific components to the genetic architecture of facial variation.
Author summary
Genetic factors play an important role in shaping human facial features. Over the last decade, studies have identified numerous genes associated with various facial traits. The vast majority of these studies have focused on European or Asian populations, while African populations have been underrepresented. Increasing the diversity of these analyses can reveal novel associations and cross-population analyses can help deepen our understanding of known genetic associations. We therefore performed a genome scan of 3D facial features in African children from Tanzania and then compared our results to Europeans. We found 20 regions of the genome associated with facial shape in Tanzanian children, 10 of which were also present in Europeans, indicating evidence for a partly shared genetic basis for human facial shape across populations. In addition, about half of the genetic associations observed in Tanzanians were not present in Europeans, and some of the shared signals differed between populations in the specific genetic variants associated or specific facial traits affected. These results shed light on the shared and population-specific genetic contributors to normal-range facial variation.
Citation: Liu C, Lee MK, Naqvi S, Hoskens H, Liu D, White JD, et al. (2021) Genome scans of facial features in East Africans and cross-population comparisons reveal novel associations. PLoS Genet 17(8): e1009695. https://doi.org/10.1371/journal.pgen.1009695
Editor: Sarah A. Tishkoff, University of Pennsylvania, UNITED STATES
Received: November 14, 2020; Accepted: July 2, 2021; Published: August 19, 2021
Copyright: © 2021 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data Availability Statement: All data for the primary analyses are available in controlled access repositories. Phenotype data for the Tanzanian sample were deposited in the FaceBase Hub (FaceBase: https://www.facebase.org/; accession #FB00000667.01). Genotype data for the Tanzania sample were deposited in the Database of Genotypes and Phenotypes (dbGaP: http://www.ncbi.nlm.nih.gov/gap; accession #phs000622.v1.p1). The European comparison dataset included four subsets: The ASLPAC cohort from the UK, and three cohorts (3DFN, PSU, and IUPUI) from the USA, each with separate data sharing consents and procedures. The ALSPAC data will be made available to bona fide researchers on application to the ALSPAC Executive Committee (http://www.bris.ac.uk/alspac/researchers/data-access). Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Consent for biological samples has been collected in accordance with the Human Tissue Act (2004). Informed written consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time. The 3DFN data are available through the FaceBase Consortium (accession FB00000491.01) and dbGap (accession number phs000949.v1.p1). The participants making up the Penn State University (PSU) and Indiana University-Purdue University Indianapolis (IUPUI) datasets were not collected with broad data sharing consent. Given the highly identifiable nature of both facial and genomic information and unresolved issues regarding risks to participants of inherent reidentification, participants were not consented for inclusion in public repositories or the posting of individual data. This restriction is not because of any personal or commercial interests. Further information about access to the raw 3D facial images and/or genomic data can be obtained from the respective ethics committees; the Ethics Committee Research UZ / KU Leuven (ec@uzleuven.be), the PSU IRB (IRB-ORP@psu.edu), and the IUPUI IRB (irb@iu.edu) for the PSU and IUPUI datasets, respectively. KU Leuven provides the MeshMonk (v0.0.6) spatially dense facial mapping software, free to use for academic purposes (https://github.com/TheWebMonks/meshmonk). The co-localization analysis were based on “locuscompare” function in R program v3.6.1. Publicly available data used were: the 1000G Phase 3 data (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/), the GTEx version 7 data (https://gtexportal.org/home/datasets), ChIP-seq files from Prescott et al. [30] (GSE70751), Najafova et al. [28] (GSE82295), Baumgart et al. [29] (GSE89179), Nott et al. [27] (https://genome.ucsc.edu/s/nottalexi/glassLab_BrainCellTypes_hg19), Pattison et al. [26] (GSE119997), Wilderman et al. [31] (GSE97752), and the Roadmap Epigenomics Project [32] (https://egg2.wustl.edu/roadmap/data/byFileType/alignments/consolidated/). GWAS statistics are available on GWAS Catalog.
Funding: Funding: Tanzania data collection was supported by the National Institute of Dental and Craniofacial Research (U01-DE020054, PD/PIs: RAS/BH/OK, http://www.nidcr.nih.gov/); Center for Inherited Disease Research (X01-HG006829, PD/PI: RAS, http://www.cidr.jhmi.edu/). Pittsburgh personnel, data collection, and analyses were supported by the National Institute of Dental and Craniofacial Research (U01-DE020078, PD/PIs: SMW/MLM, R01-DE016148, PD/PIs: MLM/SMW, and R01-DE027023, PD/PIs: SMW/JRS/PC/JW). Funding for genotyping by the National Human Genome Research Institute (X01-HG007821 and X01-HG007485, PD/PI: MLM) and funding for initial genomic data cleaning by the University of Washington provided by contract HHSN268201200008I from the National Institute for Dental and Craniofacial Research awarded to the Center for Inherited Disease Research. Penn State personnel, data collection, and analyses were supported by Procter & Gamble, Company (UCRI-2015-1117-HN-532, PD/PI: HN), the Center for Human Evolution and Development at Penn State, the Science Foundation of Ireland Walton Fellowship (04.W4/B643, PD/PI: MDS), the US National Institute of Justice (2008-DN-BX-K125, PD/PI: MDS; and 2018-DU-BX-0219, PD/PI: SW), and by the US Department of Defense. IUPUI personnel, data collection, and analyses were supported by the National Institute of Justice (2015-R2-CX-0023, 2014-DN-BX-K031, and 2018-DU-BX-0219, PD/PI: SW). The UK Medical Research Council and Wellcome (Grant ref: 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors and PC will serve as guarantor for the contents of this paper. A comprehensive list of grants funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf). ALSPAC GWAS data was generated by Sample Logistics and Genotyping Facilities at Wellcome Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. The KU Leuven research team and analyses were supported by the National Institute of Dental and Craniofacial Research (R01-DE027023, PD/PIs: SMW/JRS/PC/JW), The Research Fund KU Leuven (BOF-C1, C14/15/081 and C14/20/081, PD/PI: PC), and the Research Program of the Research Foundation – Flanders (FWO, G078518N, PD/PI: PC). Stanford University personnel and analyses were supported by the National Institute of Dental and Craniofacial Research (R01-DE027023, PD/PIs: SMW/JRS/PC/JW; and U01-DE024430, PD/PIs: JW/LS), the Howard Hughes Medical Institute, and the March of Dimes Foundation (1-FY15-312, PD/PI: JW). SN was supported by a Helen Hay Whitney Fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The human face shows a wide range of variation in shape. Although facial features change across the lifespan and can be influenced by environmental factors such as nutritional status, numerous lines of evidence from twin and family studies show that the majority of variation in facial shape is determined by genetics, with the narrow-sense heritability of facial traits estimated to be approximately 40% to 60% [1–3]. To date, at least 17 genome-wide association studies (GWAS) of facial traits have been performed, including 11 in European [3–13], two in Asian [14–16], one in Latin American [17], one in African [18], and two in mixed-ancestry populations [19,20]. These studies have used varied phenotyping strategies, and some have been quite successful in identifying genetic variants associated with aspects of facial morphology. For example, our recent GWAS meta-analysis of data-driven phenotypes derived from 3D images reported 203 signals across 138 genetic loci showing genetic associations with facial traits in Europeans [13]. In contrast, far fewer loci have been identified in non-European populations, and even fewer have been replicated across populations. Indeed, only eight have shown genome-wide significant associations across different ancestral groups (HOXD cluster, PAX3, TBX3, SOX9, PAX1, 4q31.3, 6p21.1, 20q12). African populations are particularly under-represented in facial GWA studies. The only previous GWAS of facial morphology in an African population was performed by Cole et al. using landmark-based phenotypes extracted from 3D facial images in 3,505 Bantu children and Mwanza adolescents [18]. This study did not replicate any previously identified loci, but did report two genetic associations, the SCHIP1 locus with centroid size and the PDE8A locus with the allometric variation in facial shape. Associations with these two loci have not been reported in other populations.
While this lack of overlap among associated loci across populations may be attributable to differences across studies in phenotyping modalities or insufficient power to detect small effects, it may also reflect true differences in the genetic architecture across populations. Few studies have explicitly sought to explore this question, and consequently, the degree to which facial traits share a common genetic basis across diverse populations remains largely unknown. Genome-wide scans of facial variation have investigated varied phenotypes, typically using traditional anthropometric landmarks as the basis for deriving phenotypes. However, genetic associations discovered using such approaches have been limited, likely due to the inadequacy of the simple landmark-based phenotypes in capturing the complex morphology of the face. Therefore, we previously developed a global-to-local phenotyping approach that allowed us to more fully utilize the integrated information captured from 3D facial images. This method has been applied to GWASs of European ancestry samples with great success [10,13]. In the present study, we applied the global-to-local phenotyping approach to a previously collected East African sample reported by Cole et al. [18]. We performed GWAS of facial morphology in 2,595 East Africans and compared results with those from an independent GWAS of 8,246 European-ancestry participants. Our re-analysis of this dataset points to both shared and possible population-specific associations, which deepen our understanding of the genetic architecture of normal facial variation and provides insights into the genetic underpinnings of craniofacial dysmorphology and the embryonic origin of facial morphogenesis.
Methods
Ethics statement
Tanzania discovery cohort: Written informed consent was obtained from all Tanzanian study participants or their parents as appropriate. Ethics approval for the overall study was obtained at the University of Colorado (protocol #09–0731), with additional institutional approvals at the University of Calgary, and the Catholic University of Health and Allied Sciences (Mwanza, Tanzania) in conjunction with the Tanzania National Institute of Medical Research. European replication cohorts: Institutional review board approval was obtained at each recruitment site. For the US-based cohorts this approval included the University of Pittsburgh (PITT IRB PRO09060553 and RB0405013), Seattle Children’s Hospital (Seattle Children’s IRB 12107), University of Texas Health Science Center at Houston (UT Health Committee for the Protection of Human Subjects HSC-DB-09-0508), University of Iowa (University of Iowa Human Subjects Office IRB 200912764 and 200710721), the Pennsylvania State University (PSU IRB #’s 13103, 45727, 2015–3073, 2503, 44929, 4320, 44929, and 1278), and Indiana University (IUPUI IRB 1409306349). For the UK-based cohort, ethical approval was obtained from the Avon Longitudinal Study of Parents and their Children (ALSPAC) Ethics and Law Committee and the Local Research Ethics Committees. Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time. Consent for biological samples has been collected in accordance with the Human Tissue Act (2004). Written informed consent was obtained from all participants or their parents before participation.
Recruitment and data collection
This study is a re-analysis of the dataset described by Cole et al [18]. The African cohort included 3,555 participants from the Mwanza region of Tanzania comprising 1,582 males and 1,973 females aged 3 to 21 years (S1 Fig). 3D facial images were collected using the Creaform MegaCapturor (MC) camera three-dimensional (3D) photogrammetric imaging system or the Creaform Gemini (GM) 3D imaging system. 3D facial images were obtained while participants maintained closed mouths and neutral, relaxed facial expressions during image capture. Exclusion criteria included personal history of a known birth defect or family history of an orofacial cleft. The sample included some related participants; therefore, one member of each kinship was randomly chosen for inclusion. A total of 960 participants were excluded from analysis based on exclusion criteria, resulting in 2,595 unrelated participants that were retained for the genetic analysis. Population structure of the Tanzania cohort was assessed using principal component analysis (PCA) of genotyped SNPs chosen for high call rate (>95%), minor allele frequency (MAF) >0.05 and low linkage disequilibrium (LD; pairwise r2 <0.1 across variants in a sliding window of 10 Mb). Based on the scree plot and joint distributions, we determined that four principal components (PCs) were sufficient to adjust for the effect of population structure within the sample [18] (see S2 Fig).
Genotyping for the Tanzania cohort was performed using the Illumina HumanOmni2.5Exome-8v1_A array by the Center for Inherited Disease Research (CIDR) of Johns Hopkins University. Quality control procedures were performed to exclude low-quality single nucleotide polymorphisms (SNPs) and samples, as described previously [18]. Imputation was performed using the 1000 Genomes Project reference. Filters for imputation INFO score <0.8, genotype-per-participant probability <0.9, missing imputation rate <0.5, MAF <0.01, and deviations from Hardy-Weinberg equilibrium (p value < 1 × 10−6) were used to exclude SNPs from analysis. In total, >1.5M SNPs were included in the GWAS.
Phenotyping
Phenotyping was performed using the pipeline described in Claes et al. [10]. The 3D surface images were imported into Matlab in wavefront.obj format, and processed using the “MeshMonk” open-source package [21]. First, the individual 3D images were cropped and trimmed to remove hair and imaging artifacts. Five landmarks were placed on each face in a consensus reference frame, which established a rough image orientation. A bilateral symmetrical anthropometric mask of 7,160 quasi-landmarks was subsequently mapped onto the 3D images. A Generalized Procrustes Analysis (GPA) was used to eliminate the differences in the orientation, position, and size of the quasi-landmark configurations. This work focused on the symmetrical variation in facial phenotypes by averaging quasi-landmark positions between left and right sides of the face of images with their reflections.
Quality control was performed on facial images to identify outliers that were likely due to image mapping errors. First, outlier faces were identified by measuring the Mahalanobis distance transformed to a z-score. Images with z-scores of >2, indicative of atypical facial shape, were visually inspected. Second, a metric was calculated to gauge image artifacts such as holes and spikes, which indicates missing parts or errors during processing steps. Images with high scores were visually inspected. After visual inspection, outlier images and images with artifacts were either excluded due to poor quality or re-mapped.
To generate facial shape phenotypes for genetic analysis, we performed a global-to-local facial segmentation process. Facial shape was first adjusted for covariates (including age, sex, height, weight, facial size [centroid-size], and genomic principal components), and then hierarchically partitioned into facial segments using an unsupervised and data-derived strategy [10]. This phenotyping method resulted in 63 partially overlapping facial segments arranged across five levels in a bifurcated hierarchical manner (S3A Fig). After the global-to-local segmentation step, each of the 63 facial segments was subjected to another GPA followed by a Principal Component Analysis (PCA) across the 3D coordinates of the quasi-landmarks within the segment for dimensionality reduction. Parallel analysis was used to determine the number of PCs retained, resulting in sets of PCs (6 to 57) capturing most of the shape variation (95% to 98%) in each facial segment [22].
GWAS
The genetic association between each SNP and variation in each of the 63 facial segments (each represented by a set of PCs) was tested using canonical correlation analysis (CCA) under the additive genetic model as implemented in the “canoncorr” function in Matlab. This resulted in the linear combination of PCs that maximized the correlation with the SNP. Since the CCA approach cannot incorporate the effects of covariates, adjustments for sex, age, age-squared, height, weight, facial size, and four principal components of ancestry were made prior to testing, as previously described [10]. Significance of the CCA was determined by Rao’s F-test approximation (right tail, one-sided test). Associations with p-values < 2.5 × 10−8, the genome-wide significance threshold for African populations, were annotated [23]. The 63 facial segments represent partially overlapping regions of the face; therefore, the effective number of independent phenotypes tested was determined to be 40 based on the method by Li and Ji [24]. A study-wide significance threshold was set at p-value < 6.25 × 10−10 (i.e., 2.5 × 10−8/40) to account for the multiple testing burden due to the multiple, partially overlapping, facial segments.
Gene annotation
We utilized the Ensembl Biomart toolset to identify the genes located within a 500Kb window (250Kb downstream and upstream) of lead GWAS SNPs. We searched the literature for evidence of the involvement of nearby genes in craniofacial development, morphology, or dysmorphology. Based on this corroborating evidence, the potential candidate genes for each leading SNP were noted.
Expression quantitative trait locus (eQTL) co-localization analysis
For each genetic locus identified in the Tanzania cohort, we extracted the summary statistics of SNPs within 500Kb up- or downstream of lead SNP from the GWAS results and downloaded their eQTL data from the Genotype-Tissue Expression (GTEx) project (version 7). The “locuscompare” function in R program v3.6.1 was used to estimated co-localization of facial-associated variations and eQTLs using six tissues relevant to craniofacial morphology (i.e. adipose subcutaneous, adipose visceral omentum, fibroblasts, muscle skeletal and two skin tissues) [25].
Cell-type-specific enhancer enrichment
Enhancer enrichment analyses were performed as described in our previous study [13]. In brief, fastq-format Chip-seq data of histone H3 on lysine K27 (H3K27ac) signal were downloaded from the University of California, Santa Cruz (UCSC) Genome Browser and Gene Expression Omnibus (GEO) [26–30]. The tagAlign-format Chip-seq data of H3K27ac signal (GSE; embryonic craniofacial tissue) [31] and the Roadmap Epigenomics Project (https://egg2.wustl.edu/roadmap/data/byFileType/alignments/consolidated/; various fetal and adult tissues and cell-types) [32] were downloaded. Chromosomal coordinates of both Chip-seq data types were aligned to the human genome build GRC37/hg19. We divided the genome into 20 kb windows and used bedtools coverage (v2.27.1) to calculate H3K27ac reads per million (RPM) from each of the aligned read files in each window. We then normalized the matrix of 154,614 windows and 133 ChIP-seq data sets using the “normalize.quantiles” function in R program. The windows containing the lead SNP of each genome-wide significant locus were used for enrichment analysis.
In-silico replication of Tanzanian hits in a European dataset
Results from the Tanzania discovery sample were compared to an existing meta-GWAS of European ancestry. The European cohort was comprised of 8,246 participants, including a combination of three datasets from United States (US) and a dataset from United Kingdom (UK). The UK dataset included samples from the ALSPAC study [33,34], a longitudinal birth cohort in which pregnant women residing in Avon, UK with an expected delivery date from 1st April 1991 to 31st December 1992 were recruited. At the time, 14,541 pregnant women were recruited and DNA samples were collected for 11,343 children. Genome-wide data was available for 8,952 subjects of the B2261 study, titled “Exploring distinctive facial features and their association with known candidate variants.” The intersection of unrelated participants of European ancestry with quality-controlled images, covariates, and genotype data included 3,566 individuals. The ALSPAC study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool (http://www.bris.ac.uk/alspac/researchers/our-data/). Details of three US datasets, including the study enrollment, collection of high-resolution 3D facial images, phenotyping, genotyping, and genetic analysis, have been described previously [13]. S4 Fig presents the workflow of the study, including the data collection of Tanzania and European-ancestry cohorts.
Direct replication testing was complicated by two features of the study design. First, the data-driven facial segmentation process and PCA of the quasi-landmarks used to generate the multivariate phenotypes are specific to each dataset. That is, the 63 segments are not exactly comparable across studies (S3 Fig). Second, the linear combination of PCs identified in the CCA analysis is specific to each SNP and each dataset. Therefore, the SNP associations reported in the Tanzania cohort do not necessarily represent the same morphological variation as in the European meta-GWAS. To address these issues, we performed five types of in silico replication analyses in the European meta-GWAS: (Test 1 [T1]) SNP-level testing (via linear regression) of the projection of the European-ancestry dataset onto the Tanzania derived phenotype (e.g. facial segmentation, PCs, and linear combination of PCs defined in the CCA) as a univariate phenotype; (T2) SNP-level look-up for the "best segment" (i.e., the segment showing the most significant evidence of association across the 63 segments); (T3) locus-level (+/- 500kb from lead SNP) look-ups for the "best segment"; (T4) SNP-level look-up for a qualitatively similar facial segment; (T5) locus-level look-ups for a qualitatively similar facial segment. The statistical analysis was implemented separately for each combination of genome-wide significant SNPs and corresponding facial segments.
The projected phenotype approach (T1) ensures that traits being compared across the two cohorts are equivalent and provides a means of directly replicating in Europeans the same genotype-phenotype relationship identified in the Tanzanian GWAS. The SNP-level look-ups in qualitatively similar and “best” segments allows the effect of a variant to differ across African and European ancestry groups. The locus-level look-ups allows for genetic differences (e.g., in linkage disequilibrium patterns, minor allele frequencies, allelic heterogeneity) across African and European ancestry groups. Cumulatively, these approaches (T1-T5) allow for detection of effects that meet different criteria for inter-ethnic replication. The projected phenotype approach (T1) provides the most direct means of replicating the same genotype-phenotype relationship identified in the Tanzanian GWAS, whereas the similar segment approach (T4) relaxes the need for the facial effect to be exactly identical. However, both of these approaches could miss replicating effects due to differences in LD structure across populations if the tested SNP is not actually causal. The locus-level approaches (T3 and T5) can accommodate detection of different causal variants, or different LD-based proxies of the same unobserved causal variant, across populations. The “best segment” approaches (T2 and T3) can accommodate detection of the associated variants that manifest differently across populations given the distinct facial morphologies in Europeans and Africans.
For replication tests (T1—T5), the significance threshold after Bonferroni correction was determined as 0.05 divided by the number of GWAS signals tested for replication. For the best segment SNP/locus-level replication test (T2 and T3), this threshold was further divided by the number of independent facial segments (n = 40). For the locus-level tests (T3 and T5), we further divided by the effective number of SNPs [24] at the locus or used the genome-wide association threshold of 5 × 10−8 in the European population, whichever was greater, as the significant threshold for replication.
For loci showing genome-wide association in the Tanzania cohort, co-localization analysis was performed based on the association between the best facial segment in the Tanzania cohort and a comparable segment in the European sample. Co-localization analysis was performed using the “locuscompare” function in R.
In silico replication of European hits in the Tanzania dataset
The previous study in Europeans reported 203 significant associations with facial variation. To explore these associations in the Tanzania cohort, we performed three in silico replication tests (analogous to those previously described for replicating Tanzanian associations in Europeans): (T1) SNP-level testing of the projection of the Tanzania dataset onto the European derived phenotype; (T2) SNP-level look-up for the "best segment" and (T3) locus-level look-ups for the "best segment". The significance threshold after Bonferroni correction was determined in the same way as described in the replication analyses in the European meta-GWAS.
In silico replication of previously reported landmark-based and qualitative trait associations in the Tanzania dataset
In addition to the previous studies in Europeans using the same data-driven global-to-local phenotyping approach as used here, there have been 14 GWAS and one whole-exome sequencing study, to date, using a priori landmark-based (e.g., linear distances, ratios, etc.) and qualitative phenotypes (e.g., self-reported chin dimples, etc.). The associated variants from these studies are summarized in S1 Table, including a total of 112 loci that have been implicated in previous studies at the genome-wide threshold for significance. We then investigated these 112 GWAS signals for their association with facial variation in the Tanzania cohort by conducting two in silico replication tests (T2 and T3) using the same methods as described above for determining the significant threshold after Bonferroni correction.
Results
Facial segmentation
The data-driven global-to-local phenotyping procedure yielded 63 hierarchically arranged facial segments, as shown in Fig 1A. The whole face was first split into two regions representing the midface (segment 3) and the outer face (segment 2), and then further partitioned into regions representing the lower face (quadrant 1), the mouth and regions around the eyes (quadrant 2), the nose (quadrant 3), and the upper face (quadrant 4). Variation in each segment was represented by 6 to 57 PCs, with the more global segments generally requiring more PCs to capture the variation than the more local segments (S2 Table). The facial segmentation was similar to that in a previous study of Europeans [10,13], with both yielding quadrants that represent variation in lips, nose, upper face and lower face area (S3 Fig). However, the Tanzania sample yielded a distinct sub-quadrant representing shape variation in the eye area, which was absent in the European-driven map.
(a) Rosette showing the global-to-local partitioning of the full face into segments. The full face (segment 1, red) is first partitioned into segments representing the outer (2, orange) and inner (3, cyan) regions of the face. These are in turn partitioned into more localized regions representing the lower face (magenta), upper face (salmon), nose (blue), and mouth and eyes (green). (b) Combined Manhattan plot highlighting the genome-wide significant genetic variants across 63 facial segments. Significantly associated variants are colored to correspond to the facial segments as shown in (a). The blue dotted line and red solid line indicate the genome-wide (P < 2.5 × 10−8) and study-side (P < 6.25 × 10−10) significance thresholds, respectively.
GWAS results in the Tanzania cohort
GWAS results for each of the 63 facial segments are shown in a composite Manhattan plot (Fig 1B). We identified 189 SNPs across 20 genetic loci showing genome-wide significant (P < 2.5 × 10−8) evidence of association with at least one of the 63 facial segments (Tables 1 and S3). Regional association plots for these 20 loci are provided in the supplementary material (S5 Fig). These associations involve facial segments in different quadrants, including seven loci associated with nose-related traits (quadrant 3), four loci associated with eye-related traits (quadrant 2), and three loci associated with segments in more than one quadrant. For nine of the 20 GWAS signals, associations were restricted to localized segments (i.e., at hierarchical level four and five). Of these 20 GWAS signals in Tanzanians, 10 loci (3q28, 4p15.2, 5q14.3, 5q31.1, 7q22.1, 9p21.3, 9q21.33, 10p15.3, 13q13.3, 18q22.1) represent novel associations with facial variation. Moreover, we identified co-localization with eQTLs in fibroblast, skeletal muscle, skin, and adipose tissues/cells for EEFSEC (Fig 2).
Note, eQTL results from one representative tissue are shown; similar eQTL signals were observed across multiple tissues and/or cells. The top right plot (b) shows the association results in the Tanzania GWAS; the bottom right plot (c) represents the eQTL results; the left plot (a) shows the colocalization of genetic association and eQTL signals. The SNP indicated by the purple diamond is the SNP for which the African LD information is shown.
Among these 20 GWAS signals we observed 76 SNPs across six loci that passed the strict threshold of study-wide significance (p < 6.25 × 10−10) (see callouts in Fig 1B). Five of these loci (at 3q21.3, 4q31.3, 10q26.11, 12q14.3, and 12q21.31) had previously been identified in the recent facial meta-GWAS of Europeans using the same open-ended phenotyping approach [13], and one locus (9p21.3) was previously identified in a meta-GWAS of the same Europeans using a different phenotyping approach that leveraged facial resemblances among external siblings pairs [35].
Cell-type-specific enhancer enrichment
We explored the cis-regulatory activities of the 20 GWAS loci across more than 100 cell types/tissues. As shown in Fig 3, cranial neural crest cells (CNCCs) and embryonic craniofacial tissues showed the highest H3K27ac signal in the vicinity (within 20kb) of the lead SNPs from the 20 GWAS loci, compared with other cell type/tissues (p = 2.4 × 10−15). No predominant enrichment of H3K27ac signal was observed for other primary cell type/tissues. These observations are consistent with previous studies in Europeans [10,13].
Boxplots indicate H3K27ac signal (log-transformed coverage) in the vicinity of the 20 GWAS loci (within 20kb) in individual samples; cranial neural crest cells and embryonic craniofacial samples are colored blue and orange, respectively. The dashed line at ~2.5 is the median signal across all cell types and tissues.
We utilized the Roadmap Epigenomics Project data of human embryonic craniofacial samples to identify specific genetic elements that may play a regulatory role during embryonic development nearby the 20 GWAS signals [36]. Among these 20 signals, 11 were located in the vicinity of active enhancers marked by H3K27ac, H3K4me1 or H3K4me2 epigenomic modifications, suggesting their involvement in enhancer activity in craniofacial development. We observed craniofacial-specific enhancer activity at loci 5q31.1 and 12q21.31. Specifically, the signal at 12q21.31 was located near a region with elevated H3K27ac signal at Carnegie stages 17 (CS17) and beyond, and the signal at 5q31.1 was located near a region with H3K27ac, H3K4me1, and H3K4me2 signals from CS13-CS20. Notably, both putative enhancers at 5q31.1 and 12q21.31 contained highly conserved sequences based on base-wise conservation across 100 vertebrates by PhyloP (S6 and S7 Figs).
Comparisons of East African and European populations
Among the 20 GWAS loci, nine lead SNPs and two high-LD (r2 > 0.8) proxy SNPs were available in the European dataset. The remaining nine SNPs were not available in the European cohort due to the low minor allele frequency in the European population. Seven of the 11 lead SNPs (i.e. rs56063440, chr3:127963189, rs9995821, rs242980, rs10878346, rs74112009 and rs16983329) replicated under the projected phenotype approach (T1), indicating consistent genetic effects in the projected African-trait facial segment (Table 1). SNP-level look-ups for qualitatively similar facial segments (T4) and for the “best segments” (T2), showed that the same seven SNPs were associated with facial variation in the previous GWAS of Europeans (Table 1). Of the four lead SNPs that failed to replicate, two (rs114777090 and rs10122939) had very low frequencies in the European cohort (MAF<0.01), which may account for the lack of signal, and two (rs11959408 and rs9603276) had high frequencies in both populations, suggesting that differences in allele frequency, alone, are unable to explain the lack of signal. The locus-level replication approaches (T3 and T5) indicated that an additional three loci, 1q22, 12q24.21 and 13q32.3, were associated with facial variation in European samples, albeit with different associated SNPs. The S8 Fig presents the SNP-level and locus-level replication in Europeans.
We performed co-localization analysis for the 20 GWAS loci using the summary statistics of qualitatively similar facial segments from European GWAS. Notably, our findings suggested that signals at 3p14.3, 4q31.3, 10q26.11, and 12q14.3 may share the same causal variants in African and European populations (S5 Fig). For the rest of the loci, the co-localization plots showed more complicated scenarios (Fig 4). For example, at 3q21.3, the peak SNP in the European GWAS was located within a broad LD block of approximately 300Kb, with many associated SNPs highly correlated with each other, which poses a challenge in determining the causal variant underlying the association. Combining the signal in the Europeans with the signal in Tanzanians provided a more finely mapped result, suggesting a specific casual variant, rs56850662. This is the same variant that co-localized with the eQTL signal for EEFSEC, suggesting a possible role for EEFSEC in craniofacial morphology (Figs 2 and S5). For locus 12q21.31, only a subset of SNPs showing association in Europeans co-localized with the Tanzania signal, identifying a shared associated variant, and suggesting that more than one genetic element is related to mouth and upper lip morphology in Europeans.
LocusCompare visualizations of colocalization between Tanzania GWAS and European GWAS at (a-c) 3q21.3 and (d-f) 12q21.31. The top right plots (b and e) show the association results in the Tanzania GWAS; the bottom right plots (c and f) represent the corresponding results in the European GWAS; the left plots (a and d) are visualizations of colocalization. For each locus, the SNP indicated by the purple diamond is the SNP for which the LD information is shown, with African LD structure indicated in the colocalization plot. The vertical gray dashed lines indicate the p-values of SNPs from the Tanzania GWAS that were unavailable in the European GWAS; the horizontal gray dashed lines indicate the p-values of SNPs from the European GWAS that were unavailable in in the Tanzanian GWAS.
In addition to exploring the effects of the Tanzania GWAS signals in Europeans, we also examined 203 association signals across 138 genetic loci previously identified in the GWAS of Europeans for their associations with the corresponding projected facial trait in the Tanzania sample. Of the 203 lead SNPs, 195 were available in the Tanzania sample. Among tested SNPs, 12 were associated with the same facial phenotype (T1) in the Tanzania sample, indicating consistent effects across different ancestry groups (S4 Table). The SNP-level look-ups for the “best segment” revealed eight significant associations, including four signals that were not identified by the projected phenotype approach (T2).
Beyond testing specific lead SNPs for replication, we also explored evidence of association in the Tanzania cohort for all SNPs across the 203 signals in case different variants in these regions were associated in the different ancestry groups. For the candidate locus scan, 13 additional signals were associated under the significance threshold of 0.05 divided by 203 and then divided by the effective number of independent SNPs at each locus. However, these loci were associated with different facial traits (i.e., modules representing different regions of the face) across the different populations, suggesting the presence of more than one regulatory element within a genetic locus potentially affecting different facial segments. The shared association signals in African and European populations are displayed in the Miami plot (S8 Fig).
Test of previously reported facial-associated loci
As shown in S1 Table, genetic association at 112 loci have been reported in the previous 15 studies of a priori landmark-based and qualitative (e.g., cleft chin, cheek dimple) phenotypes, of which five and 11 loci were replicated based on (T2) and (T3), respectively (S1 Table). Furthermore, 12 out of these 112 loci had previously shown associations in at least two independent GWAS, with genetic effects of PAX3, PAX1, SOX9, TBX3 and the HOXD cluster on different facial traits reported in European, Asian, and Latin American populations. Among these 12 genetic loci that were identified by multiple GWASs, CACNA2D3, DCHS2, TBX3, PAX1, and the HOXD cluster were significantly associated with facial variation in the Tanzania cohort after Bonferroni correction (S1 Table). In contrast, SOX9 and PAX3 have been reported across populations (S1 Table), whereas neither of these genes showed association in Tanzania cohort.
Discussion
In this study, we performed a GWAS of multidimensional facial traits in a sample of 2,595 unrelated healthy children and adolescents of African ancestry. We identified 20 genetic loci that were associated with normal facial variation at a genome-wide significance threshold of p < 2.5 × 10−8. Of these, six loci (3q21.3, 4q31.3, 9p21.3, 10q26.11, 12q14.3 and 12q21.31) surpassed our more conservative threshold for study-wide significance (p < 6.25 × 10−10). The locus 9p21.3 was not identified in the previous meta-GWAS of data-driven facial traits in Europeans, but a nearby SNP (~20Kb to the peak SNP in Tanzania) was identified in a GWAS of different traits derived from resemblance between siblings and projected into the same cohort of Europeans. The lead 9p21.3 SNP, rs10122939, is in the vicinity of the MLLT3 gene, a crucial regulator of human haematopoietic stem cells associated with acute leukemia. It remains unclear how the genetic findings of MLLT3 may relate to facial morphology. Another candidate at 9p21.3 is FOCAD, located ~350Kb downstream of the peak SNP, which is a potential tumor suppressor highly expressed in brain tissues. Haaland et al. identified a parent-of-origin interaction effect between FOCAD and maternal smoking contributing to cleft lip [37]. In order to determine the possible roles of genes at this locus on facial traits, subsequent functional studies are needed.
For the locus at 3q21.3, though different lead SNPs were observed in the Tanzania and European GWASs, co-localization analysis narrowed this down to a single intronic variant, rs56850662, in the EEFSEC gene, suggesting that this SNP drives the association with nose and lip morphology in both populations (S5 Fig). Furthermore, this variant co-localized with an eQTL signal for EEFSEC, supporting the role of EEFSEC as a candidate gene for facial morphology.
4q31.3 is a facial-associated locus with accumulated genetic evidence that indicates its role in face formation [3,10,17]. The present study replicated the association with normal nasal variation in an African population, demonstrating its involvement in the genetic architecture of nose morphology across populations. The nearest gene to the peak SNP in the Tanzania GWAS is SFRP2, encoding the Secreted Frizzled Related Protein 2. SFRP2 functions as modulator of Wnt signaling, whose overexpression can induce the limb outgrowth defect [38,39]. Craniofacial defects, limb outgrowth defects, and extra digits were reported in Sfrp2-/- mutant mice [40,41]. Another interesting candidate at 4q31.3 is the DCHS2 gene, located ~300Kb downstream of the peak SNP. DCHS2 encodes a calcium-dependent cell-adhesion protein, known as a key partner in the Fat-Dachsous signaling pathway that coordinates cartilage differentiation and polarity during craniofacial development [42].
The other three loci showing study-wide evidence of association in Tanzanians, 10q26.11, 12q14.3, and 12q21.31, were reported by previous GWASs in European or Asian samples. Our findings revealed that these associations were shared across different ancestry groups. Together with functional evidence [7,43–45], plausible candidate genes within these loci (i.e. HMGA2 at 12q14.3, EMX2 at 10q26.11, and ALX1 at 12q21.31) may play critical roles in craniofacial development. Further investigation is needed to establish their functionality and the mechanisms through which they impact normal facial variation.
In addition to the study-wide significant loci, we also identified 10 novel loci in Africans at genome-wide significance, some of which were near genes having known involvement in craniofacial development. For example, CXCL14, which was associated with variation in the eye region, plays a critical role in ocular tissues during development [46–48]. Knockdown of CXCL14 led to eyelid and mandibular defects in about a third of chick embryos, which is consistent with its expression in eyelid ectoderm and the first branchial arch [47]. We also identified new signals near biologically plausible genes at previously reported loci. For example, the association signal at 13q32.3, near the zinc finger genes, ZIC2 and ZIC5, was also associated with variation in the eye region. ZIC5 is expressed in the developing eyes in Xenopus embryos, and loss of zic5 in mice causes craniofacial anomalies [49,50]. Moreover, mouse and Xenopus models have shown that Zic5 protein is involved in the generation of neural crest tissue [49,51]. Because this association in Tanzanians was moderately far (about 430kb) from the previously identified signal in Europeans and affected different regions of face (eyes in Tanzanians vs. forehead in Europeans) [13], it is unclear whether it should be considered a novel, separate locus or a new association signal at the same locus. In any case, these associations with biologically plausible candidates observed in Africans require additional research to determine whether they are ancestry-specific.
Of the 20 GWAS signals identified in the Tanzania cohort, seven showed associations in Europeans, suggesting trans-ancestry genetic effects on normal facial variation. Notably, these loci showed significant associations with the same facial segments among populations, which improved the reliability of the findings. For some of these loci, the GWAS in Europeans identified more global (broader) effects on the face compared to Tanzanians. For example, the peak SNP at locus 3q21.3 was associated with shape of nose and upper lip in Europeans, but only associated with the shape of nose in Tanzanians. The other 13 loci did not replicate in Europeans. For 11 of these, the top SNP had low allele frequency in the European cohort (MAF<0.01), suggesting these differences may be partly attributable to the allele frequency differences.
Our GWAS in Tanzanians not only uncovered novel loci and candidate genes related to facial morphology, but also advanced our understanding of previously identified loci in Europeans. Several loci (such as 3q21.3 and 12q21.31) showed associations with facial morphology across different populations. However, due to the strong LD in Europeans, association was detected across a broad genomic region (~300Kb), posing a challenge for identifying the likely causal variant at these loci. In conjunction with the European results, co-localization utilizing the GWAS in Tanzanians provided a more fine-mapped association. Given human evolutionary history, African populations are characterized by a greater level of genetic diversity and less LD among loci compared with European populations. Because of the specific LD structure, the GWAS in Africans offered valuable insights relevant to the genetic factors that contribute to normal facial variation. That said, only a fraction of the loci originally identified in Europeans showed evidence of association in the Tanzanian cohort, which we postulate may be partly attributable to the population differentiation. Of the 203 European signals, more than half of the peak SNPs had substantial allele frequency differences between European and African populations (MAF difference >0.1), which would impact the power to detect associations. Furthermore, the low rate of replication may also be due to insufficient power in the Tanzanian sample due to the smaller sample size and the stricter p-value threshold for declaring significance. For these reasons, we caution that lack of replication across populations should not be taken as conclusive evidence that a signal is population-specific.
As a whole, our results provided a glimpse into the developmental origins of facial variation. Given that CNCCs are a group of embryonic cells that give rise to most facial structures and arise at 3–6 weeks of human gestation, if the associations with facial shape captured by our GWAS are due to effects occurring early in embryogenesis then we expect activity of facial-associated loci to be observed in CNCCs. We not only showed the enrichment of facial-associated variants in CNCCs and embryonic craniofacial tissues, but also identified two facial-associated loci overlapping with putative craniofacial-specific enhancer activity. The regulatory elements lie in sequences highly conserved across vertebrates, indicating their possible functional importance. Specifically, the GWAS signal at 12q21.31 associated with facial variation in the eye region overlaps with a craniofacial-specific enhancer that is active beginning at Carnegie Stage 17, when the eyelids begin to form [52].
Continuing challenges in researching the genetics of facial shape include the heterogeneity across diverse human populations, inconsistent phenotypic strategies, and the influence of environmental factors. We performed the same data-driven facial-segmentation phenotyping strategy in both Tanzanian and European cohorts, which leads to differences between studies in the exact facial phenotypes. Though overall quite similar, one major difference in the Tanzania facial segmentation was the emergence of an eye-related sub-quadrant, which was absent from the facial segmentation in Europeans. As a result of the presence of eye-related segments, we identified novel loci associated with facial shape around the eyes. The eye-related segmentation may be partly attributed to the different approach to collect 3D images. For the European cohorts, 3D facial images were obtained while participants’ eyes were fully opened during image capture, whereas for the Tanzania cohort, participants were not required to open their eyes while collecting the 3D image. To determine the effect of variation in open vs. closed eyes during imaging, we included the predicted open/closed state of the eyes as an additional covariate in our facial model and re-ran the segmentation and genome scans. Segmentation was largely similar with the exception of the eye-related sub-quadrant, which was absent in the eye-adjusted analysis. Likewise, genetic association results were similar, with the exception of signals specific to the eye-related modules. See S1 Appendix describing these methods and results.
Another potential limitation of the study is the restricted age range of our sample, comprising mostly children and adolescents whose faces are still developing. While our covariate adjustments adequately accounted for mean effect of age and age2 on facial variation, the sample may include some children whose faces are under- or over-developed for their chronological age. This deviation from the average growth trajectory may decrease the signal to noise ratio, thus reducing power to detect genetic associations, or may represent a timing-specific aspect of facial variation, possibly under genetic control. Despite this limitation, it is important to acknowledge that failure to completely account for the effects of age variation on facial traits is unlikely to result in false positive genome-wide signals.
This study is a re-analysis of the dataset originally reported in Cole et al. [18] and represents the first effort to apply a global-to-local phenotyping method to a GWAS of Africans. Using the phenotypic method, we have greatly expanded the number of discovered loci that were associated with normal-range facial traits in an African population. Of note, none of the identified signals overlapped the two previously identified loci in this cohort using landmark-based size and shape phenotypes [18]. Considering that the two studies used the same GWAS cohort, but different phenotyping strategies that capture altogether different aspects of facial variation, the difference in results is not unexpected. In particular, Cole et al. [18] reported associations of SNPs in SCHIP1 and PDE8A with facial size, whereas in the present study we did not investigate size or allometry phenotypes; instead we adjusted our phenotypes for facial size. Moreover, the global-to-local phenotyping approach employed in this study was developed specifically because landmark-based phenotypes, such as those used in Cole et al. [18] do not as fully capture shape variation as do the global-to-local modules. That the observed associations differ by phenotyping approach is consistent with what has previously been reported in European samples. Previous GWAS of global-to-local facial shape modules yielded more but altogether different signals than traditional landmark-based phenotypes in the same sample [8,10]. Taken together, these observations in both European and African datasets suggest that fundamentally different aspects of the underlying genetic architecture of facial variation are being captured by two phenotypic strategies.
While verifying the association of some previously reported loci, we provide new evidence that these loci contribute to facial shape variation across populations. Among these replicating loci, several genes, such as EEFSEC, SFRP2, EMX2, ALX1 and HMGA2, are biologically plausible candidates that play important roles in embryological facial tissues. In addition, we revealed 10 new genetic loci that passed the threshold for genome-wide significance, and which are candidates for future replication studies. These findings improve our understanding of genetic and biological basis underpinning the diversity of human facial structure and may offer valuable insights into to biological mechanisms responsible for craniofacial morphogenesis and dysmorphology. Additional genetic replication and experimental validation will be required to verify the handful of newly identified genes/loci with unclear roles in craniofacial development.
Supporting information
S1 Fig. Age distribution among participants that were retained for the genetic analysis.
https://doi.org/10.1371/journal.pgen.1009695.s001
(TIF)
S2 Fig. Principal component analysis (PCA) scatterplot illustrating the population structure of 2,595 unrelated participants that were retained for the genetic analysis.
https://doi.org/10.1371/journal.pgen.1009695.s002
(TIF)
S3 Fig.
Color-coded facial segmentations in Tanzanians (left) and Europeans (right). The entire face (red) is partitioned in to outer face (orange) and midface (cyan), and further partitioned in more localized regions representing the lower face (magenta), upper face (salmon), nose (blue), and mouth and eyes (green).
https://doi.org/10.1371/journal.pgen.1009695.s003
(TIF)
S5 Fig. LocusZoom (LZ) plots, colocalization plot and a polar dendrogram showing global-to-local effect for each of 20 GWAS signals.
Three LZ plots represent the genetic associations in (a) the best facial segment in Tanzania, (b) a comparable facial segment in Europeans, and (c) the best facial segment in Europeans. (d) A colocalization plot between the Tanzania GWAS and European GWAS at the chromosome 1q22 locus; (e) the association results in the Tanzania GWAS; (f) the association results in the European GWAS. (d-f) The SNP indicated by the purple diamond is the SNP for which the LD information is shown, with African LD structure indicated in the colocalization plot. The vertical gray dashed lines indicate the p-values of SNPs from the Tanzania GWAS that were unavailable in the European GWAS; the horizontal gray dashed lines indicate the p-values of SNPs from the European GWAS that were unavailable in in the Tanzanian GWAS (g) A polar dendrogram showing the global-to-local effect in Tanzania GWAS. Facial segments with a p-value lower than the genome-wide threshold (p = 2.5 × 10−8) are circled in black.
https://doi.org/10.1371/journal.pgen.1009695.s005
(PDF)
S6 Fig.
(a) regional association plot for the signal at 5q31.1. (b) UCSC genome browser custom tracks for the 5q31.1 region, in which the yellow colored bars represent enhancer activity and the green colored bars represent Tx (Strong_transcription).
https://doi.org/10.1371/journal.pgen.1009695.s006
(TIF)
S7 Fig.
(a) regional association plot for the signal at 12q21.31. (b) UCSC genome browser custom tracks for the 12q21.31 region, in which the yellow/orange colored bars represent enhancer activity; the purple colored bars represent PromBiv (Bivalent Promoter); the grey colored bars represent ReprPC (Repressed_PolyComb).
https://doi.org/10.1371/journal.pgen.1009695.s007
(TIF)
S8 Fig.
Miami plot showing (upper) Tanzanian and (lower) European GWAS results. The dashed blue line in each panel indicates the genome-wide significance threshold, and the red solid line indicates the study-wide significance cutoff. In each panel, the cyan and red colored points, respectively, represent signals showing locus-level and SNP-level evidence of replication in the alternate cohort.
https://doi.org/10.1371/journal.pgen.1009695.s008
(TIF)
S1 Table. Replication of previously reported landmark-based and qualitative trait associations in the Tanzania dataset.
https://doi.org/10.1371/journal.pgen.1009695.s009
(XLSX)
S2 Table. The number of principal components retained after parallel analysis for each facial segment.
https://doi.org/10.1371/journal.pgen.1009695.s010
(XLSX)
S3 Table. Full results 20 GWAS signals in Tanzania.
https://doi.org/10.1371/journal.pgen.1009695.s011
(XLSX)
S4 Table. In silico replication of European hits in the Tanzania dataset.
https://doi.org/10.1371/journal.pgen.1009695.s012
(XLSX)
S1 Appendix. Prediction of open vs. closed eyes and sensitivity analysis exploring the effect of open vs. closed eyes on the facial segmentation and genome-wide association analyses.
https://doi.org/10.1371/journal.pgen.1009695.s013
(PDF)
Acknowledgments
We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. We are additionally grateful to the research teams involved in collecting the Tanzanian dataset and the US component of the European dataset, and to all of the participants that contributed to those datasets.
References
- 1. Cole JB, Manyama M, Larson JR, Liberton DK, Ferrara TM, Riccardi SL, et al. Human Facial Shape and Size Heritability and Genetic Correlations. Genetics. 2017; 205:967–78. pmid:27974501.
- 2. Tsagkrasoulis D, Hysi P, Spector T, Montana G. Heritability maps of human face morphology through large-scale automated three-dimensional phenotyping. Scientific reports. 2017; 7:45885. pmid:28422179.
- 3. Xiong Z, Dankova G, Howe LJ, Lee MK, Hysi PG, Jong MA de, et al. Novel genetic loci affecting facial shape variation in humans. eLife. 2019; 8. pmid:31763980.
- 4. Paternoster L, Zhurov AI, Toma AM, Kemp JP, St Pourcain B, Timpson NJ, et al. Genome-wide association study of three-dimensional facial morphology identifies a variant in PAX3 associated with nasion position. American journal of human genetics. 2012; 90:478–85. pmid:22341974.
- 5. Liu F, van der Lijn F, Schurmann C, Zhu G, Chakravarty MM, Hysi PG, et al. A genome-wide association study identifies five loci influencing facial morphology in Europeans. PLoS genetics. 2012; 8:e1002932. pmid:23028347.
- 6. Jacobs LC, Liu F, Bleyen I, Gunn DA, Hofman A, Klaver CCW, et al. Intrinsic and extrinsic risk factors for sagging eyelids. JAMA dermatology. 2014; 150:836–43. pmid:24869959.
- 7. Pickrell JK, Berisa T, Liu JZ, Ségurel L, Tung JY, Hinds DA. Detection and interpretation of shared genetic influences on 42 human traits. Nature genetics. 2016; 48:709–17. pmid:27182965.
- 8. Shaffer JR, Orlova E, Lee MK, Leslie EJ, Raffensperger ZD, Heike CL, et al. Genome-Wide Association Study Reveals Multiple Loci Influencing Normal Human Facial Morphology. PLoS genetics. 2016; 12:e1006149. pmid:27560520.
- 9. Lee MK, Shaffer JR, Leslie EJ, Orlova E, Carlson JC, Feingold E, et al. Genome-wide association study of facial morphology reveals novel associations with FREM1 and PARK2. PloS one. 2017; 12:e0176566. pmid:28441456.
- 10. Claes P, Roosenboom J, White JD, Swigut T, Sero D, Li J, et al. Genome-wide mapping of global-to-local genetic effects on human facial shape. Nature genetics. 2018; 50:414–23. pmid:29459680.
- 11. Howe LJ, Lee MK, Sharp GC, Davey Smith G, St Pourcain B, Shaffer JR, et al. Investigating the shared genetics of non-syndromic cleft lip/palate and facial morphology. PLoS genetics. 2018; 14:e1007501. pmid:30067744.
- 12. Indencleef K, Roosenboom J, Hoskens H, White JD, Shriver MD, Richmond S, et al. Six NSCL/P Loci Show Associations With Normal-Range Craniofacial Variation. Frontiers in genetics. 2018; 9:502. pmid:30410503.
- 13. White JD, Indencleef K, Naqvi S, Eller RJ, Roosenboom J, Lee MK, et al. Insights into the genetic architecture of the human face. bioRxiv. 2020. pmid:33288918
- 14. Endo C, Johnson TA, Morino R, Nakazono K, Kamitsuji S, Akita M, et al. Genome-wide association study in Japanese females identifies fifteen novel skin-related trait associations. Scientific reports. 2018; 8:8974. pmid:29895819.
- 15. Cha S, Lim JE, Park AY, Do J-H, Lee SW, Shin C, et al. Identification of five novel genetic loci related to facial morphology by genome-wide association studies. BMC genomics. 2018; 19:481. pmid:29921221.
- 16. Wu W, Zhai G, Xu Z, Hou B, Liu D, Liu T, et al. Whole-exome sequencing identified four loci influencing craniofacial morphology in northern Han Chinese. Human genetics. 2019; 138:601–11. pmid:30968251.
- 17. Adhikari K, Fuentes-Guajardo M, Quinto-Sánchez M, Mendoza-Revilla J, Camilo Chacón-Duque J, Acuña-Alonzo V, et al. A genome-wide association scan implicates DCHS2, RUNX2, GLI3, PAX1 and EDAR in human facial variation. Nature communications. 2016; 7:11616. pmid:27193062.
- 18. Cole JB, Manyama M, Kimwaga E, Mathayo J, Larson JR, Liberton DK, et al. Genomewide Association Study of African Children Identifies Association of SCHIP1 and PDE8A with Facial Size and Shape. PLoS genetics. 2016; 12:e1006174. pmid:27560698.
- 19. Qiao L, Yang Y, Fu P, Hu S, Zhou H, Peng S, et al. Genome-wide variants of Eurasian facial shape differentiation and a prospective model of DNA based face prediction. Journal of genetics and genomics = Yi chuan xue bao. 2018; 45:419–32. pmid:30174134.
- 20. Li Y, Zhao W, Li D, Tao X, Xiong Z, Liu J, et al. EDAR, LYPLAL1, PRDM16, PAX3, DKK1, TNFSF12, CACNA2D3, and SUPT3H gene variants influence facial morphology in a Eurasian population. Human genetics. 2019; 138:681–9. pmid:31025105.
- 21. White JD, Ortega-Castrillón A, Matthews H, Zaidi AA, Ekrami O, Snyders J, et al. MeshMonk: Open-source large-scale intensive 3D phenotyping. Scientific reports. 2019; 9:6085. pmid:30988365.
- 22. HORN JL. A RATIONALE AND TEST FOR THE NUMBER OF FACTORS IN FACTOR ANALYSIS. Psychometrika. 1965; 30:179–85. pmid:14306381
- 23. Pe’er I, Yelensky R, Altshuler D, Daly MJ. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol. 2008; 32:381–5. pmid:18348202.
- 24. Li J, Ji L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity. 2005; 95:221–7. pmid:16077740.
- 25. Liu B, Gloudemans MJ, Rao AS, Ingelsson E, Montgomery SB. Abundant associations with gene expression complicate GWAS follow-up. Nature genetics. 2019; 51:768–9. pmid:31043754.
- 26. Pattison JM, Melo SP, Piekos SN, Torkelson JL, Bashkirova E, Mumbach MR, et al. Retinoic acid and BMP4 cooperate with p63 to alter chromatin dynamics during surface epithelial commitment. Nature genetics. 2018; 50:1658–65. pmid:30397335.
- 27. Nott A, Holtman IR, Coufal NG, Schlachetzki JCM, Yu M, Hu R, et al. Brain cell type-specific enhancer-promoter interactome maps and disease-risk association. Science (New York, N.Y.). 2019; 366:1134–9. pmid:31727856.
- 28. Najafova Z, Tirado-Magallanes R, Subramaniam M, Hossan T, Schmidt G, Nagarajan S, et al. BRD4 localization to lineage-specific enhancers is associated with a distinct transcription factor repertoire. Nucleic acids research. 2017; 45:127–41. pmid:27651452.
- 29. Baumgart SJ, Najafova Z, Hossan T, Xie W, Nagarajan S, Kari V, et al. CHD1 regulates cell fate determination by activation of differentiation-induced genes. Nucleic acids research. 2017; 45:7722–35. pmid:28475736.
- 30. Prescott SL, Srinivasan R, Marchetto MC, Grishina I, Narvaiza I, Selleri L, et al. Enhancer divergence and cis-regulatory evolution in the human and chimp neural crest. Cell. 2015; 163:68–83. pmid:26365491.
- 31. Wilderman A, VanOudenhove J, Kron J, Noonan JP, Cotney J. High-Resolution Epigenomic Atlas of Human Embryonic Craniofacial Development. Cell reports. 2018; 23:1581–97. pmid:29719267.
- 32. Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015; 518:317–30. pmid:25693563.
- 33. Boyd A, Golding J, Macleod J, Lawlor DA, Fraser A, Henderson J, et al. Cohort Profile: the ’children of the 90s’—the index offspring of the Avon Longitudinal Study of Parents and Children. International journal of epidemiology. 2013; 42:111–27. pmid:22507743.
- 34. Fraser A, Macdonald-Wallis C, Tilling K, Boyd A, Golding J, Davey Smith G, et al. Cohort Profile: the Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort. International journal of epidemiology. 2013; 42:97–110. pmid:22507742.
- 35. Hoskens H, Liu D, Naqvi S, Lee MK, Eller RJ, Indencleef K, et al. 3D facial phenotyping by biometric sibling matching used in contemporary genomic methodologies. PLoS genetics. pmid:33983923
- 36. Wilderman A, VanOudenhove J, Kron J, Noonan JP, Cotney J. High-Resolution Epigenomic Atlas of Human Embryonic Craniofacial Development. Cell reports. 2018; 23:1581–97. pmid:29719267.
- 37. Haaland ØA, Romanowska J, Gjerdevik M, Lie RT, Gjessing HK, Jugessur A. A genome-wide scan of cleft lip triads identifies parent-of-origin interaction effects between ANK3 and maternal smoking, and between ARHGEF10 and alcohol consumption. F1000Research. 2019; 8:960. pmid:31372216.
- 38. Nakajima H, Ito M, Morikawa Y, Komori T, Fukuchi Y, Shibata F, et al. Wnt modulators, SFRP-1, and SFRP-2 are expressed in osteoblasts and differentially regulate hematopoietic stem cells. Biochemical and biophysical research communications. 2009; 390:65–70. pmid:19778523.
- 39. Kurosaka H, Iulianella A, Williams T, Trainor PA. Disrupting hedgehog and WNT signaling interactions promotes cleft lip pathogenesis. The Journal of clinical investigation. 2014; 124:1660–71. pmid:24590292.
- 40. Satoh W, Gotoh T, Tsunematsu Y, Aizawa S, Shimono A. Sfrp1 and Sfrp2 regulate anteroposterior axis elongation and somite segmentation during mouse embryogenesis. Development (Cambridge, England). 2006; 133:989–99. pmid:16467359
- 41. Morello R, Bertin TK, Schlaubitz S, Shaw CA, Kakuru S, Munivez E, et al. Brachy-syndactyly caused by loss of Sfrp2 function. Journal of cellular physiology. 2008; 217:127–37. pmid:18446812.
- 42. Le Pabic P, Ng C, Schilling TF. Fat-Dachsous signaling coordinates cartilage differentiation and polarity during craniofacial development. PLoS genetics. 2014; 10:e1004726. pmid:25340762.
- 43. Carneiro M, Hu D, Archer J, Feng C, Afonso S, Chen C, et al. Dwarfism and Altered Craniofacial Development in Rabbits Is Caused by a 12.1 kb Deletion at the HMGA2 Locus. Genetics. 2017; 205:955–65. pmid:27986804.
- 44. Askary A, Xu P, Barske L, Bay M, Bump P, Balczerski B, et al. Genome-wide analysis of facial skeletal regionalization in zebrafish. Development (Cambridge, England). 2017; 144:2994–3005. pmid:28705894.
- 45. Uz E, Alanay Y, Aktas D, Vargel I, Gucer S, Tuncbilek G, et al. Disruption of ALX1 causes extreme microphthalmia and severe facial clefting: expanding the spectrum of autosomal-recessive ALX-related frontonasal dysplasia. American journal of human genetics. 2010; 86:789–96. pmid:20451171.
- 46. Ojeda AF, Munjaal RP, Lwigale PY. Expression of CXCL12 and CXCL14 during eye development in chick and mouse. Gene Expr Patterns. 2013; 13:303–10. Epub 2013/05/30. pmid:23727298.
- 47. Ojeda AF, Munjaal RP, Lwigale PY. Knockdown of CXCL14 disrupts neurovascular patterning during ocular development. Dev Biol. 2017; 423:77–91. Epub 2017/01/15. pmid:28095300.
- 48. Gordon CT, Wade C, Brinas I, Farlie PG. CXCL14 expression during chick embryonic development. Int J Dev Biol. 2011; 55:335–40. pmid:21710440.
- 49. Merzdorf CS. Emerging roles for zic genes in early development. Dev Dyn. 2007; 236:922–40. pmid:17330889.
- 50. Fujimi TJ, Mikoshiba K, Aruga J. Xenopus Zic4: conservation and diversification of expression profiles and protein function among the Xenopus Zic family. Dev Dyn. 2006; 235:3379–86. pmid:16871625.
- 51. Inoue T, Hatayama M, Tohmonda T, Itohara S, Aruga J, Mikoshiba K. Mouse Zic5 deficiency results in neural tube defects and hypoplasia of cephalic neural crest derivatives. Dev Biol. 2004; 270:146–62. pmid:15136147.
- 52. Pearson AA. The development of the eyelids. Part I. External features. Journal of anatomy. 1980; 130:33–42. pmid:7364662