Facial shape and allometry quantitative trait locus intervals in the Diversity Outbred mouse are enriched for known skeletal and facial development genes

The biology of how faces are built and come to differ from one another is complex. Discovering normal variants that contribute to differences in facial morphology is one key to untangling this complexity, with important implications for medicine and evolutionary biology. This study maps quantitative trait loci (QTL) for skeletal facial shape using Diversity Outbred (DO) mice. The DO is a randomly outcrossed population with high heterozygosity that captures the allelic diversity of eight inbred mouse lines from three subspecies. The study uses a sample of 1147 DO animals (the largest sample yet employed for a shape QTL study in mouse), each characterized by 22 three-dimensional landmarks, 56,885 autosomal and X-chromosome markers, and sex and age classifiers. We identified 37 facial shape QTL across 20 shape principal components (PCs) using a mixed effects regression that accounts for kinship among observations. The QTL include some previously identified intervals as well as new regions that expand the list of potential targets for future experimental study. Three QTL characterized shape associations with size (allometry). Median support interval size was 3.5 Mb. Narrowing additional analysis to QTL for the five largest magnitude shape PCs, we found significant overrepresentation of genes with known roles in growth, skeletal and facial development, and sensory organ development. For most intervals, one or more of these genes lies within 0.25 Mb of the QTL’s peak. QTL effect sizes were small, with none explaining more than 0.5% of facial shape variation. Thus, our results are consistent with a model of facial diversity that is influenced by key genes in skeletal and facial development and, simultaneously, is highly polygenic.


Introduction
The facial skeleton is a complex morphology that serves a diverse set of critical roles. It houses and protects the forebrain and most sensory organs and provides essential anatomy for oral communication, feeding, and facial expression. Structured differences (covariance) in facial shape, and between facial shape and size, depend upon variation in the biological processes that coordinate facial development [1][2][3][4][5][6][7]. Understanding how these processes act and combine to build a facial skeleton requires interrogation at multiple levels of the genotype-phenotype (GP) map [8,9].
Among the most fundamental of these inquiries is the search for genetic variants that contribute to facial shape variation. These genetic variants are the basic building blocks of diversity and, within species, circumscribe the potential for short-term evolutionary change [10,11]. Over the past decade, dozens of studies in humans, mouse models, and other species have dissected GP associations for the craniofacial complex using moderate-to-high density genetic maps (tens of thousands to millions of markers). The focus is typically on additive effects [12], though some recent work considers interactions among loci [4, 13,14].
In humans, genome-wide association studies (GWAS) targeting variation in soft tissue facial form have been completed for European [15][16][17][18][19][20], African [21], South American [22], and Asian [23] cohorts. Collectively, the studies report around 50 loci for facial morphology [24]. While there is limited overlap in the candidate gene sets among studies [25], and though many of the individual genes have no previously known role in facial development [3], the genes that do replicate across GWAS (PAX 1, PAX3, HOXD cluster genes, DCHS2, SOX9) are known to be important to facial and/or skeletal morphogenesis.
Compared to human studies, mapping the genetics of craniofacial shape variation in mice offers several benefits, including greater control over genetic and environmental variation and the ability to record observations, directly or in silico, on the skeleton itself [14,[26][27][28][29][30][31]. In recent years, the focus of mouse skull QTL analyses has expanded to include tests of enrichment for disease phenotypes and roles in skull development, as well as the search for and validation of candidate genes [32][33][34]. Dozens of QTL have been identified. A recent mouse skull shape QTL investigation reported that nearly 80% of QTL intervals identified overlapped with those from prior studies [32], despite differences in genetic background, breeding design, and quantitative methods. As with human facial shape GWAS candidates, the number of QTL identified is encouraging but also a small beginning-on the order of the total number of loci identified for human height just a decade ago [35]. Quantitative genetics analyses [32, 36,37] as well as expression and epigenomic atlases of facial development both strongly point to the potential for hundreds or thousands of genetic loci with as-yet-undetected QTL variance contributions [38][39][40]. Increased sample sizes, greater mapping resolution, and the development of new experimental lines all have the potential to expand the facial shape QTL library and increase the precision of variant localization [41][42][43].
The candidate genes that overlap between human GWAS and mouse QTL craniofacial shape investigations are few, and in mouse, mostly reside in wide QTL intervals that contain hundreds of genes [33]. To some extent, whether sequence orthologs contribute to craniofacial shape variation in mice and humans is unknown because, thus far, somewhat different phenotypes have been measured in the two species. Human GWAS craniofacial shape datasets are derived from non-invasive surface scans of soft tissue facial morphology. Studies in mice record landmarks on the bones of the whole cranium. Face QTL signals in mouse datasets may be swamped in a whole skull analysis because QTL effect sizes are likely to be small for complex quantitative traits like facial shape [15] and because the most commonly used shape registration algorithm tends to redistribute variation among the coordinates of a landmark configuration [44].
The present study maps shape QTL in the mouse facial skeleton, a dependent observation more in line with those evaluated in human craniofacial GWAS. We use a sample of n = 1147 Diversity Outbred mice (DO; Jackson Laboratory, Bar Harbor, ME) [45]. The DO is a random outcross model derived from the Collaborative Cross (CC) founding lines [43,[46][47][48]. The DO is specifically designed for genetic mapping of complex traits. Each animal's genome is a unique mosaic of the genetic diversity present in the CC founders-more than 45 million segregating SNPs [49]. Random outcrossing (of non-sibs) over many DO generations maintains this diversity and increases recombination frequency, which is critical for reducing confidence intervals of localized QTL [41]. Compared to classic backcross and intercross designs, the combinatorial possibilities for allelic variation within and among DO animals are vast.
We quantify facial shape with 22 three-dimensional landmarks (Fig 1) and estimate its association with founder haplotype variation at 56,885 genetic markers. The dependent observations in the QTL scans are scores on shape PCs. For each PC explaining more than 1% of shape variation, we identify statistically significant marker effects using a mixed effects regression model that simultaneously accounts for kinship among animals [50, 51]. We then undertake a more detailed analysis of QTL intervals for shape PCs 1-5. We estimate founder coefficients, test whether the overall gene set is enriched for biological processes relevant to facial development and perform association mapping to highlight a list of potential causal genes. In order to visualize and quantify the magnitude of marker effects on multivariate facial shape (as opposed to univariate PC scores), for each QTL peak, we fit an additional mixed model that accommodates high-dimensional dependent observations [52].

QTL and shape variable heritability
We used a mixed effects regression model to quantify GP associations between 56,885 genetic markers and each of the 20 shape PCs explaining >1% of the symmetric component of facial shape variation (S1 Fig). Evidence for a QTL was evaluated by comparing logs odds (LOD) scores (likelihood ratios comparing the magnitude of marker model residuals to the residuals of a null model that excludes the marker effect) [53] at each marker to a distribution of maximum LOD scores obtained from quantifying marker effects over 1000 random permutations of GP assignments.
From the 20 genome scans, 37 QTL surpassed our threshold for statistical significance (95% quantile of the maximum LOD score distribution). Median support interval width was 3.5 Mb (mean 7.1 Mb) using a 1.5 LOD-drop threshold to define support interval boundaries. While comparisons to prior studies are complicated by design features unique to each, the design employed here appears to have conferred benefits in terms of QTL identification and localization, the primary objectives of QTL analysis [41]. Comparable prior studies identified 17-30 QTL for the whole skull, with a lowest median QTL interval of 7.8 Mb (using a less-demanding 1.0 LOD-drop threshold) [32,33].
Collectively, the non-overlapping regions of the QTL intervals for PCs 1-20 span 262 Mb. While this constitutes just over 9% of the GRCm38.p6 mouse reference genome (http://www. ensembl.org/index.html), Y-chromosome excluded, they include mouse orthologs for 20% of previously identified human facial shape GWAS candidates (Cdh8, Dync1l1, Osr1, Pax9, Pkdcc, Runx2, Supt3h, Tmem163, Wdr27, Wdr35). While our gene enrichment analysis primarily focuses on QTL intervals associated with the five highest variance facial shape PCs (see Biological Process Enrichment, below), we note that the complete gene set is enriched for genes with known roles in facial development (FDR q-value = 5.56e -3 ) and contains more than a dozen such genes. Fig 2 plots the support intervals for these QTL (see also S1 Table). The QTL span 15 of 19 autosomes as well as the X-chromosome. On chromosomes 7, 14, 15, and 17, QTL from multiple PCs have overlapping associations with the same genomic region.
For each PC, the random effect heritability estimate quantifies the proportion of phenotypic variance attributable to overall genetic similarity (relatedness) at the founder level [51]. We excluded from the relatedness calculation all markers on the same chromosome as the QTL marker of interest, which avoids the loss of power that can arise when relatedness is computed from the full complement of genotyped markers [54][55][56]. This leave-one-chromosome-out (LOCO) approach produces a unique heritability estimate for each chromosome on each PC. Table 1 reports the mean heritability for each PC, as well as the variance in these heritabilities (averaged over chromosomes). For most PCs, heritability was moderately high (0.42-0.57), with a few low heritability outliers. These values are very similar to those Pallares and colleagues estimated for skull shape PCs in an outbred mouse sample [32], and lie mostly in the middle of the heritability range for human facial shape (~30-70%) [36,57].
Based on signal-to-noise evaluation of shape PC eigenvalue decay [58], we limited further analysis to the first five PCs (effectively four PCs, as no marker exceeded the permutation significance threshold for the PC 5 shape axis). Each plot in Fig 3 depicts the LOD score profile (lower panel) and CC founder haplotype coefficients (upper panel) for a chromosome harboring one or more QTL for a PC 1-4 shape variable. Shaded rectangular regions delimit the 1.5-LOD drop support interval boundaries for the QTL. Founder coefficients were estimated using best linear unbiased predictors [59, 60], which highlights QTL regions by imposing a degree of shrinkage on the coefficients chromosome-wide [50].
We also mapped GP associations for vectors capturing covariation between shape and two measures of size: centroid size (CS) and body mass. Shape change associated CS and body mass variation were closely correlated with the primary axis of shape variation in our sample (r PC1-CS : 0.985, r PC1-Mass : 0.875, r CS-Mass : 0.927). These correlations carry forward to the QTL profiles, as well. We identify two QTL each for CS and body mass allometry, one of which, on Mean h 2 Variance chromosome 6, is common to the two allometry vectors. Together, the three QTL-on chromosomes 1, 6, and 7-completely overlap with the PC 1 shape QTL (Fig 4).
The allometry results suggest a developmental model in which the determinants of different size measures have partially correlated effects on cranial shape [61]. The near coincidence of shape PC 1 with our explicitly allometric variables has been observed in many geometric and classical morphometric investigations, and there is a long history of treating PC 1 as an allometry variable in morphometric analysis [61][62][63][64][65][66]. With this in mind, though primarily for economy, we regard PC 1 QTL as allometry QTL in the remainder of this manuscript.
The mouse subspecies that comprise the CC are estimated to have diverged several hundred thousand years ago [67]. In many DO studies, subspecific divergence in one or more wildderived CC lines figures prominently in the founder genetic effects contrasts that underlie QTL (e.g., [68][69][70][71]). Here, the same pattern is evident for most of the PC 1-4 QTL. In six of the eight QTL (or seven of nine, if the two peaks of the PC 2, chromosome 14 QTL are considered distinct), the coefficient for a non-domesticus wild-derived strain (CAST or PWK) sits at one extreme of the founder coefficient distribution. Only at the upstream LOD peak of the PC 2, chromosome 14 QTL is the founder coefficient contrast clearly between M.m. domesticus inbred lines.
The broad region of high LOD scores for PC 1 on chromosome 7 merits individual attention. This QTL support interval essentially delimits the mouse homolog boundaries of the human Prader-Willi/Angelman syndromes domain [72]. The founder coefficients point to a strong subspecies signal, with PWK (M.m. musculus) highly differentiated from wild-derived WSB and the inbred M.m. domesticus lines, and CAST (M.m. castaneus) intermediate. Though the QTL interval contains fewer than 40 genes, nine are known to be imprinted [72][73][74][75][76]. Several more have mixed evidence for imprinting [75], and Prader-Willi and Angelman are parent-of-origin dependent neurological disorders [77]. In addition, more than a dozen genes in the interval are expressed in the developing face [40,[78][79][80][81][82][83]. The characteristic facial phenotype for Prader-Willi includes a relatively long skull with a narrow upper face [84]. The shape effects associated with the peak marker in this interval, just downstream of the primary Prader-Willi genes, contrast a shorter, wider face with a longer, narrower one (see Figs 5 and 6, below), suggesting that normal and syndromic variants in this region of the genome may perturb facial development in similar ways. Our mapping strategy is not specifically designed to quantify imprinted gene QTL [85][86][87]. However, closer dissection of this interval's contribution to facial shape variation, accounting for allelic parent of origin effects, may be warranted in the future.

Marker effects on shape
At the LOD peak marker of each PC 1-4 QTL, we collapsed the 8-state founder probabilities to 2-state SNP probabilities in order to estimate the shape effect of allelic substitution at that marker. Shape PCs are univariate axes computed independent of marker information or kinship effects. The true effect of allelic variation at a QTL peak is therefore unlikely to be identical to the direction of the PC itself. To estimate effects on total shape, we employed a Bayesian mixed effects model for highly multivariate dependent observations [52,88]. The results are plotted in Fig 5. In each figure, QTL effects are multiplied by the scaling factor required to magnify the QTL to account for 10% of phenotypic variance. These effects are shown as displacement vectors emanating from the mean configuration wireframe. The magnification ensures that even when the effect at a landmark is small, its direction and relative size (in comparison to effects at other landmarks) is apparent. We further exploited the Bayesian posterior to plot 80% confidence ellipsoids for the magnified displacements. The actual proportion of variance explained by the SNP, without magnification, is noted in the figure.

Biological process enrichment
We used the human annotations in the online Molecular Signatures Database [89] to query whether biological processes relevant to craniofacial development are overrepresented in the PC 1-4 QTL intervals. Collectively, the intervals contain 243 named, protein coding genes (S1 File), 232 of which are annotated in the Entrez Gene library utilized by the Molecular Signatures Database. Table 2 provides GO biological process enrichment highlights for these intervals, including FDR q-values adjusted for multiple testing (see S2 File for the top 100 q-value processes). Growth, morphogenesis, and skeletal system development are prominent biological process annotations for the facial shape QTL. The enrichment results suggest that no enriched biological process is uniquely associated with shape variation on a single PC. Rather, with the exception of bone development, each process in Table 2 includes annotations to each PC.

Association mapping
Finally, to develop a finer-grained impression of potential causal loci, we performed SNP association mapping within each QTL interval. We imputed a dense map of SNP probabilities to the sample by inferring their bi-allelic (major/minor) probabilities based on (a) founder probabilities at flanking, genotyped markers and (b) major/minor strain distribution patterns in the CC founder variants (Fig 6).
We then queried whether any genes within 0.25 Mb of the SNP or genome scan LOD peaks are known to be associated with craniofacial development or phenotypes ( Table 3). As with human facial shape GWAS [42], LOD peaks tended to be located in non-coding regions. While the focal genes we identified all have known roles in skeletal or craniofacial development, they also tend to be developmentally ubiquitous.

PLOS ONE
The genetics of variation in mouse facial shape PLOS ONE | https://doi.org/10.1371/journal.pone.0233377 June 5, 2020 9 / 24 micrognathia and short stature [94,102]. Kat6b (PC2, Chr. 14) activates the transcription factor Runx2, which is often referred to as a master regulator of bone formation [103][104][105]. For the PC3 QTL at 8.23-9.84 Mb on chromosome 12, the founder and SNP LOD peaks reside at opposite ends of the QTL interval, but each is very close to an important craniofacial development gene: Gdf7 (Bmp12) contributes to patterning of the jaw joint and is expressed in the developing incisor pulp and periodontium [106][107][108]; Osr1 is broadly expressed in the embryonic and developing skull, with overexpression resulting in intramembranous ossification deficiencies, increased chondrocyte differentiation, and incomplete suture closure, while underexpression may contribute to cleft palate phenotype [109][110][111]. Osr1 has also previously been identified as a candidate in human facial shape GWAS [23]. Finally, the SNP LOD peak for PC4, chromosome 17 implicates Vegfa, which is critical to endochondral and intramembranous ossification through its role in angiogenesis [112]. A distinct SNP LOD peak of nearly equal magnitude is located approximately 1 Mb upstream (at rs47194347), very close to Supt3 (human ortholog Supt3h) and Runx2. In addition to the central role Runx2 has in bone formation and development [104], both genes are human facial shape GWAS candidates [15,22].

Conclusions
While GP maps for traits influenced by many genes may be quite intricate and layered, additive effect QTL analyses remain an indispensable means to identify loci that contribute to phenotypic variation. Our investigation advances understanding of the genetics of skeletal diversity. We focused on mouse facial shape in order to make our results more comparable to human GWAS studies of craniofacial shape variation. The 37 QTL identified herein expand the small but growing facial shape QTL library. Compared to similar prior studies, the number of QTL is relatively large and their confidence intervals relatively narrow. These QTL intervals contain a substantial proportion of the candidate genes previously identified in human facial shape GWAS, suggesting that some genetic contributions to facial shape variation are conserved across mammal species. QTL effect sizes on multivariate shape were relatively small. We found a relative abundance of skeletal and craniofacial development genes in QTL intervals for the largest shape PCs. Moreover, these genes tend to be located very close to QTL peaks. Thus, our results suggest that genes essential to building a face are often also significant contributors to facial shape variation. In summary, our results are consistent with a model of facial diversity that is influenced by key genes in skeletal and facial development and, simultaneously, highly polygenic.

Diversity Outbred mouse population
The DO (Jackson Laboratory, www.jax.org) are derived from the eight inbred founder lines that contribute to the CC [43, 46], which include three mouse subspecies, and both wildderived and laboratory strains [48]. Continued random outcrossing of the DO has produced a population with relatively high genetic diversity (~45 million SNPs segregating in the CC founding populations [43]) and increasingly fine mapping resolution in advancing generations.
In combination, these features increase the potential to identify multiple QTL for complex traits [31]. On average, the genomes of the eight CC founders should be equally represented in the DNA of any DO animal. Sequenced genomes for the CC make it possible to characterize and interpret genetic variants in the DO according to 8-state founder identity and as bi-allelic (major/minor allele) SNP variation. In addition, statistical packages designed for analysis of the DO offer features to impute variants at each typed SNP in the CC founders, using the coarser DO marker map as a scaffold to infer a high-density SNP map.

Computing
All analysis was conducted in R [113]. Customized data processing and statistical analysis functions for mapping experiments in DO samples were originally published in the DOQTL package [50]. These routines are now incorporated into the qlt2 package [49], with additional processing and genotype diagnostic tools provided at https://kbroman.org/qtl2/.

Images, landmark data
Our DO sample consists of n = 1147 male and female adult mice (after exclusion of some records; see Genotypes, below) from seven DO generations (Table 4). We obtained μCT images of the mouse skulls in the 3D Morphometrics Centre at the University of Calgary on a Scanco vivaCT40 scanner (Scanco Medical, Brüttisellen, Switzerland) at 0.035 mm voxel dimensions at 55kV and 72-145 μA. On each skull, one of us (WL) collected 54 3D landmark coordinates from minimum threshold-defined bone surface with ANALYZE 3D (www.mayo.edu/bir/). Here, we analyzed the 22-landmark subset of the full configuration-a 66 coordinate dimension vector-that captures the shape of the face (Fig 1).

Ethics statement
All experimental procedures for this study were approved by the Health Sciences Animal Care Committee at the University of Calgary (protocol #AC18-0026). No animals were sacrificed for this study. Specimens were raised in several labs for unrelated experiments under approval and conduced in accordance with the guidelines set forth by the Institutional Animal Care and Use Committees of the University of North Carolina Chapel Hill (protocol #11-299), Jackson Laboratories (protocols #99066 and #15026), and The Scripps Research Institute (protocol #08-0150-3).

Shape data
Landmark configurations were symmetrically superimposed by Generalized Procrustes Analysis (GPA) [114] using the Morpho package [115]. GPA is a multi-step procedure that removes location and centroid size differences between configurations, then (iteratively) rotates each configuration to minimize its squared distance from the sample mean. We chose a superimposition that decomposes shape variation into its symmetric and asymmetric components [116,117], and stored the symmetric configurations for analysis. Working with symmetric, superimposed data reduced morphological degrees of freedom from 66 landmark coordinate dimensions to 30 shape dimensions [117].

Genotyping
For all animals, DNA was extracted from ear or tail clippings. Genotyping was performed by the Geneseek operations of Neogen Corp. (Lincoln, NE). Mice from generations 9, 10, and 15 were genotyped using the MegaMUGA genotyping array (77,808 markers); mice from generations 19, 21, 23, and 27 were genotyped using the larger GigaMUGA array (143,259 markers) [118]. The two arrays share 58,907 markers in common. We restricted the genotype data to these shared SNPs [118]. In addition, we excluded 2,022 markers from the center of chromosome 2 (40-140 Mb). WSB alleles are highly overrepresented in this genomic region in earlier DO generations. This  S2 Fig). If unaddressed, this directional shift in allele frequencies can confound statistical analysis of genotype-phenotype associations [120] with potentially false positive associations between WSB chromosome 2 haplotypes and directional changes in phenotype over DO generations. After removing this genomic region from consideration, the final SNP dataset consists of 56,885 markers common to the MegaMUGA and GigaMUGA arrays, with a mean spacing of 48 kb. The qtl2 package estimates genotype probabilities, including imputation of sporadically missing genotypes, with a multipoint Hidden Markov Model (HMM) on the observed marker genotypes. From these initial genotype probabilities, we derived three probabilistic representations of genetic state for each observation at each marker: (1) for genome scans, founder probabilities, a vector encoding the probability of genetic contribution from each of the eight CC founders to the diplotype state; (2) for association mapping, SNP probabilities, collapsing the eight founder probabilities to a two-state, major/minor allele strain distribution; and (3) for mapping multivariate shape effects, additive SNP dosage, twice the estimated major SNP probability (maximum additive SNP dosage = 2).
We identified problematic genetic samples using the DO genotype diagnostics tools available at https://kbroman.org/qtl2/ [121]. A total of 11 observations were excluded due to excessive missing marker data, which we defined as no marker information at >7.5% of markers. Three probable duplicate sets of genetic data (>99% of markers identical) were identified. Because the correct genotype-phenotype matches could not be established by other evidence, all six animals were removed from the sample. Finally, we excluded seven female samples with low X chromosome array intensity values. These samples are potentially XO [46], which can affect facial form and body size [122,123]. The exclusions described above resulted in a final, retained sample of n = 1147 animals.

Genome scan and association mapping
We mapped QTL for facial shape and facial shape allometry. To identify QTL, we implemented the linear mixed effects regression model in qtl2. The qtl2 genome scan is designed to estimate marker effects on one or more univariate observations, but not on multivariate observations. We therefore applied principal components analysis (PCA) to the superimposed coordinates, treating the specimen scores on each PC as dependent observations for analysis. Though PC variables are synthetic not biological entities [124], for high dimensional shape data, they provide an efficient means to ordinate observations on axes that concentrate variance into a relatively small number of independent dimensions [114]. For allometry, we treated common allometric component scores [63] as measured traits. Prior to deriving PCs and allometry vectors, we centered each observation on its generation mean in order to focus on shared, within-generation covariance patterns [125, p. 295].
At each marker, we fit a mixed model with fixed effect coefficients for sex, age-at-sacrifice, and the additive effect of each CC founder haplotype, a random effect of kinship, and an error term [51]. Evidence for a QTL was evaluated based on marker LOD scores, likelihood ratios comparing the residuals of the marker model to a null model excluding the marker's effect [53]. We consider the evidence for a marker-phenotype association to be statistically significant where the LOD score exceeds the 95% quantile of genome-wide maximum LOD scores computed from 1000 random permutations of genotype-phenotype associations [126]. We use a 1.5 LOD-drop rule to define the boundaries of QTL support intervals [127] and allow for multiple QTL peaks on a single chromosome.
The random effect of kinship depends upon pairwise proportions of shared alleles among observations, encoded in a kinship matrix. Loss of power to detect QTL can arise when the marker of interest and linked genes are included in the kinship matrix of the null model. To avoid this problem, we compute kinship matrices using a leave-one-chromosome-out (LOCO) approach [54][55][56].
Genome scans, QTL support intervals, and proportions of shape variation attributable to founder-level kinship (heritability) were estimated for all PCs explaining more than 1% of symmetric shape variation. However, we limit more in-depth analysis to the five PCs with reasonably large signal-to-noise ratios (S2 Fig). This was assessed using the modified "noise floor" detection strategy developed by Marroig and colleagues in [58].
In addition, within the PC 1-5 QTL intervals, we used a form of merge analysis to perform genome-wide association mapping [51, 128,129]. For each CC founder variant located between a flanking pair of genotyped markers, the vector of founder allele probabilities is assumed to be the average of the founder probabilities at those flanking markers. The bi-allelic SNP probabilities were computed from these founder probabilities and the major/minor allele strain distribution pattern in the CC. Analysis was again via mixed effects regression model, with the imputed SNP data as genotype predictors.

Enrichment analysis and candidate genes
Modern GWAS and QTL studies often attempt to contextualize QTL and candidate gene results post hoc via GO enrichment analysis. Given a list of genes near statistically significant markers, a GO analysis calculates the probability that a random set of genes of the same size would contain as many genes under a common annotation term, typically after a family-wise error rate or FDR adjustment for multiple testing. The outcome of enrichment analysis is a list of GO terms (or in the absence of any enrichment, no list), which provides a means to consider whether certain biological processes, cellular components, or molecular functions are overrepresented in the list of genes [130].
For protein coding genes in the PC 1-5 QTL support intervals and for allometry QTL, we used the Molecular Signatures Database [89] online tool and Entrez Gene library [131] to identify biological process GO term enrichment [132]. The database application implements a hypergeometric test of statistical significance, quantifying the probability of sampling, without replacement, as many (k) or more genes annotated to a GO term, given the total number of genes annotated to that term (K) and the number of genes in the QTL and Entrez gene sets (n and N, respectively). For each ontology term, after adjustment for multiple testing, we required a FDR q-value < 0.05 to conclude that there was substantial evidence for GO biological process enrichment. We also inspected the association mapping results in the +/-0.25 Mb region centered at each LOD peak for genes with convincing evidence of a role in skeletal, craniofacial, or dental development or disease. For most QTL, LOD peaks for the genome scan and association mapping results are virtually coincident. Where they differ substantially, we examine the region around both peaks.

Marker effects on shape
We considered a significant association between a marker and facial shape along a PC axis to be evidence of some effect on the multivariate (undecomposed) shape coordinate data. In order to obtain an estimate of the full shape effect, we fit an additional mixed model for the maximum LOD score marker. Fitting a mixed model for highly multivariate data presents a significant computational challenge because the number of parameters that must be estimated for the random effects covariance matrix scales quadratically with the number of traits (here, 465 parameters for 30 free variables). To address this challenge, we took the novel step of estimating the mixed model covariance matrices using a Bayesian sparse factorization approach [52, 88,133]. After a 1,000,000 iteration burn-in, we generated 1,000,000 realizations from a single Markov chain, thinning at a rate of 1,000 to store 1,000 posterior samples for inference. Though the time needed to fit such a model makes it impractical for a genome scan on a dense array, it is a useful strategy for disentangling kinship from the effect on multivariate outcomes at a small number of peak markers. For these regressions, the marker effect predictor was twice the major SNP probability. Thus, the multivariate mixed model we implemented estimates an average effect of allele substitution, a standard quantity in quantitative genetics [134]. In addition, for each regression, we estimated the proportion of shape variation explained using a version of the coefficient of determination (R 2 ) that has been adapted for mixed model parameters [135].
When plotting shape effects, we magnify the allele substitution coefficients by the scaling factor needed to cause the QTL to explain 5% of phenotypic variance. This magnification is applied both to the mean effect, in order to depict the mean displacement vector, and to the posterior distribution of effects, in order to depict confidence ellipsoids. Ellipsoids were estimated with the 3D ellipse function in the R package rgl [136]. While it is more typical to use a 95% support boundary to evaluate effect uncertainty, with shape data, morphological contrasts are best understood over the configuration as a whole [137]. The 80% ellipsoids are meant to encourage this perspective.