Inter-Ethnic/Racial Facial Variations: A Systematic Review and Bayesian Meta-Analysis of Photogrammetric Studies

Background Numerous facial photogrammetric studies have been published around the world. We aimed to critically review these studies so as to establish population norms for various angular and linear facial measurements; and to determine inter-ethnic/racial facial variations. Methods and Findings A comprehensive and systematic search of PubMed, ISI Web of Science, Embase, and Scopus was conducted to identify facial photogrammetric studies published before December, 2014. Subjects of eligible studies were either Africans, Asians or Caucasians. A Bayesian hierarchical random effects model was developed to estimate posterior means and 95% credible intervals (CrI) for each measurement by ethnicity/race. Linear contrasts were constructed to explore inter-ethnic/racial facial variations. We identified 38 eligible studies reporting 11 angular and 18 linear facial measurements. Risk of bias of the studies ranged from 0.06 to 0.66. At the significance level of 0.05, African males were found to have smaller nasofrontal angle (posterior mean difference: 8.1°, 95% CrI: 2.2°–13.5°) compared to Caucasian males and larger nasofacial angle (7.4°, 0.1°–13.2°) compared to Asian males. Nasolabial angle was more obtuse in Caucasian females than in African (17.4°, 0.2°–35.3°) and Asian (9.1°, 0.4°–17.3°) females. Additional inter-ethnic/racial variations were revealed when the level of statistical significance was set at 0.10. Conclusions A comprehensive database for angular and linear facial measurements was established from existing studies using the statistical model and inter-ethnic/racial variations of facial features were observed. The results have implications for clinical practice and highlight the need and value for high quality photogrammetric studies.


Introduction
International migration is occurring at an unprecedented pace in the contemporary world [1]. The past 50 years has witnessed a dynamic increase in the number of international migrants from 92 million in 1960 to 165 million in 2000 [1] and to 214 million in 2010 [2]; The number is estimated to reach 405 million in 2050 [2]. Therefore, it is increasingly important for professionals from various medical and dental specialties whose work involves correction of facial anomalies and achieving aesthetics to be aware of the differences in facial characteristics among ethnic/racial groups.
While inter-ethnic/racial facial variations have long been of interest to the general public, anthropologists, and medical and dental practitioners, studies providing solid evidence on this issue are surprisingly sparse. One of the most comprehensive studies by Farkas and colleagues [3] compared normative facial measurements of a North American white population with data from other regions in the world; however, the generalizability of this study is limited by its small sample size (only 30 males and 30 females) in each participating country. Moreover, the facial features investigated were limited to linear measurements/parameters, and all comparisons were made against the North American white population.
Apart from the direct anthropometric method used by Farkas and colleagues [3], several indirect anthropometric methods exist, e.g. cephalometry, photogrammetry, three-dimensional stereophotogrammetry and surface laser scanning [4,5]. Of these methods, photogrammetry provides unique advantages over other methods from several perspectives [4,5]. First, the measurements are not affected by tissue sensitivity and compressibility, which is ideal for soft tissue analysis. Second, the examination procedure is less uncomfortable from both the subjects' and examiners' side and subjects are examined free from radiation exposure. Third, permanent photographic archives allowed flexibility in selection of and objectivity in assessment of facial measurements. Furthermore, equipment for photogrammetry is portable, the examination procedure is time saving and the cost is relatively low [6]. In addition, reliability of photogrammetry proved to be excellent [6]. Therefore, despite the advanced anthropometric methods such as three-dimensional stereophotogrammetry, photogrammetry remains the optimal choice for large epidemiological studies aiming at establishing population norms [6], especially in developing countries where sophisticated equipment is not available.
Results from different anthropometric methods are not directly comparable [7,8]. To date, no meta-analysis of photogrammetric studies has been performed. To fill in this gap, we aimed to conduct a systematic review and apply a statistical model to establish database for population norms of various angular and linear facial measurements for Africans, Asians and Caucasians; and to determine inter-ethnic/racial facial variations.

Methods
This review was conducted according to a predetermined protocol (S1 Text) and was reported in line with recommendations from the MOOSE (Meta-analysis Of Observational Studies in Epidemiology) guidelines (S2 Text) [9].

Data sources and search strategies
We comprehensively searched the electronic databases of PubMed (1997 onward), ISI Web of Science (1956 onward), EMBASE (1947 onward) and Scopus (1995 onward) with no restrictions on language, dates or status of publication. The initial search was updated to 1 st December, 2014 using automatic e-mail alerts. One reviewer (YFW) developed the search strategy and conducted the initial search using controlled vocabularies and keywords. The search strategy for all four databases is available in S3 Text. Reference lists of articles that were identified in the screening process were also manually searched.

Study selection
Two trained and calibrated reviewers (YFW and HMW) independently screened titles and abstracts of the identified records during the first round screening. In the second round screening, full texts of those records judged to be potentially eligible were retrieved and assessed for eligibility. Inter-reviewer agreement was assessed using Cohen's κ. Discrepant opinions between the reviewers were resolved by discussion at the end of each round, and a senior author (CM) was consulted if consensus could not be reached.
This review sought to identify all facial photogrammetric studies regardless of the type of study design. We considered studies for inclusion if they recruited African, Asian or Caucasian subjects between 18 to 45 years old; adopted the well-established definitions of facial landmarks and measurements (S1 and S2 Tables) [10][11][12]; and if standard error (SE) could be extracted or estimated from the report. Studies were excluded if they recruited exclusively the following subjects: attractive/beautiful subjects; subjects with severe malocclusion, developmental craniofacial disfigurement, history of facial trauma/fracture or cosmetic surgery; or patients with systematic disorders known to affect craniofacial development. Furthermore, we required the reported measurements to be accurate to one decimal place for linear measurements in millimeters and angular measurements in degrees. We attempted to acquire missing information by E-mail enquiry of the studies' correspondence author whenever needed.

Data extraction
Study characteristics and demographics such as name of the first author, year of publication, study location, origin of the subjects, sample source, sample size, age range, and gender were extracted. We also extracted details of the photographic process including the subjects' body position, head posture, occlusal position, lip/chin posture and the camera-subject distance.
We intended to extract 11 angular and 18 linear facial measurements that have the greatest clinical implications (S3 and S4 Tables). Measurements were recorded by mean and standard deviation (SD); conversions were made if confidence interval or SE was reported. Articles reporting on more than one population group were regarded as many separate studies as the number of heterogeneous populations they contained. Different articles investigating the same group of subjects were considered as one study.
Data extraction was performed by one reviewer (YFW) using a predefined piloted spreadsheet in Microsoft Excel 2013 and the results of extraction were then verified by a second reviewer (HMW). Discrepancies were resolved by consensus or further consultation of a third investigator (CM).

Assessment of risk of bias
To ascertain the validity of each eligible study, risk of bias was assessed based on an instrument [13] that has been used in systematic reviews on craniofacial anthropometrics [4,5]. Further modifications of the instrument were made in view of potential sources of bias unique to photogrammetric studies [14]. We included 17 items assessing four domains of the eligible studies: study design, photo taking process, facial measurements and the appropriateness of statistical analysis (S5 Table).
Our criteria for risk of bias assessment is detailed in S6 Table. A score of 0, 0.5 or 1 was assigned to each item indicating free of bias, partially free of bias and subject to bias, respectively. In cases of inapplicable items, no scores were given. A score was calculated for each study by dividing the sum of item scores by the total number of applicable items. Studies with scores below 0.40 were considered as with low risk of bias. Two trained and calibrated reviewers (YFW and HMW) assessed the studies and a third reviewer (CM) resolved discrepancies.

Statistical analysis
Despite our extensive literature search, data for several facial measurements were still sparse, especially when analyses were stratified by gender. In addition, while we rigorously followed the predefined inclusion and exclusion criteria during article screening, there were still varying degrees of risk of bias among the eligible studies. To fully utilize our extracted data, a Bayesian hierarchical random effects model was constructed, with contrasts established for pairwise comparisons among the ethnic/racial groups.
The multilevel modelling approach naturally applies a hierarchical structure to the extracted data where individual studies were nested within ethnicities/races that in turn were nested within the total population. In addition, the Bayesian approach to multilevel modelling has additional advantages of allowing for greater flexibility in modelling variability at different levels and enabling us to make direct probability statements [15,16]. In the Bayesian hierarchical model, ethnicity/race-specific estimates of a facial measurement were more model-driven when there was substantial uncertainty on the basis of a small number of studies, whereas for ethnicities/races with less uncertainty, the estimates were more data-driven [17].
S4 Text and S1 and S2 Figs details statistical models for each level of the hierarchy. In a single level notation, the overall model to estimate facial measurements from the i th study of the j th ethnicity/race is: where μ 00 is the grand mean of the facial measurement across ethnicities/races, η 0j and z ij represent ethnicity/race-specific and study-specific random effects that are normally distributed with mean 0 and between-ethnicity/race variance τ 2 and between-study variance σ 2 , respectively, and ij denotes sampling error for each individual study.
Non-informative priors were specified for τ and σ using the half-Cauchy distribution with the scale set to be 25. The grand mean μ 00 was assigned a non-informative normal prior, i.e. μ 00 * N(0, 10 4 ). Linear contrasts were constructed to explore inter-ethnic/racial variations of the measurements [18].
We fitted the Bayesian hierarchical model using the Markov chain Monte Carlo (MCMC) algorithm to generate samples of posterior distributions of all model parameters, including ethnicity/race-specific estimates of facial measurements and the linear contrasts. The analyses were performed separately for males and females. A facial measurement was meta-analysed only if there were data from two or three ethnicities/races with at least one of the ethnicities/ races included two or more eligible studies. Estimates of the facial measurements were informed by posterior means and 95% credible intervals (CrIs) of the posterior distributions. Inter-ethnic/racial variations were explored at significance levels of 0.05 and 0.10 by examining whether 0 was included in the 95% and 90% CrIs of the linear contrasts, respectively. The 95% (90%) CrI was obtained by taking the 2.5 th (5 th ) and 97.5 th (95 th ) percentiles of the posterior distributions. The MCMC sampling algorithm was performed using the JAGS software (version 3.4.0) [19] on R version 3.1.1 (R Development Core Team, 2014) [20].

Literature search
Fig 1 summarises the process of study identification and selection. We retrieved 3769 published original articles, abstracts, letters and reviews from the search of electronic databases and additional hand searching. After the first round study selection based on titles and abstracts (κ = 0.97), 308 potentially eligible articles were accessed for full-texts and underwent the second round study selection. Of these, 36 eligible articles  (κ = 0.95) that reported 38 studies were identified.

Study characteristics
Characteristics of the eligible studies are detailed in Table 1. All studies had a cross-sectional design. The year of publication ranged from 1989 to 2014. One study was in Chinese, one in Korean, and the remaining 36 studies were in English. The studies involved 6686 subjects (male: 2944, female: 3742). Following Risch and colleagues' ethnicity/race classification scheme on the basis of numerous population genetic surveys [57], subjects were considered as Africans if they were African Americans or Afro-Caribbeans originating from the sub-Saharan Africa; Asians if they were from China, Indochina (e.g. Cambodia, Malaysia, and Vietnam), Japan, Korea, the Philippines and Siberia in eastern Asia; and Caucasians if they were from Indian subcontinent, Middle East, North Africa with ancestry in Europe and West Asia. As a result,

Risk of bias
Detailed risk of bias ratings are available in S5 Table. Of the 38 studies included in analysis, 23 (60.5%) were deemed low risk of bias, with the rest (39.5%) classed as high risk of bias. Scores of these studies ranged from 0.06 to 0.66. Over 70% of studies on Asians and 66.7% studies on Caucasians were of low risk of bias, whereas 58.3% of the African studies were subject to high risk of bias. When each item in the instrument is assessed (Fig 2), sampling methods was found underreported in most studies (57.9%). Regarding the photo taking process, most studies failed to adequately address the subjects' body posture (63.2%), head position (55.3%) and lip posture (63.2%). Only three studies (8.9%) described the subjects' occlusal position. Photographic parameters were reported in seven studies (18.4%). As to facial measurements, most studies defined facial landmarks by photo illustration (65.8%) and only eight studies provided written definitions. Measurement reliability was addressed in 16 studies (42.1%) and ten of them (26.3%) reported the reliability measure of method error.

Discussion
This systematic review and meta-analysis is the first to collate all available photogrammetric studies to establish a comprehensive database for ethnicity/race-specific population norms of a variety of angular and linear facial measurements. Furthermore, this study for the first time comprehensively explored inter-ethnic/racial facial variations among the three major ethnic/ racial groups. Our study provides strong evidence of inter-ethnic/racial variations as per nasofrontal angle, nasolabial angle and nasofacial angle. In addition, we observed substantial interethnic/racial differences for linear measurements including width of the face, height of the forehead II and physiognomical height of the face.
Our meta-analysis updates results of an international anthropometric study [3] and a systematic review [58]. Compared with these studies, the meta-analysis adds to the literature by including both angular and linear facial measurements rather than being restricted to linear measurements related to the neoclassical canons [59]. Besides, our approach to investigating inter-ethnic/racial facial variations were more intuitive than relying on frequency distributions of arbitrarily defined categories [3] or focusing on the variance component of the measurements [58].  Ethnic/racial categorization in medical research is an issue of ongoing debate [57,60,61]. Despite the claim from some medical journals that ethnic/racial categorization is biologically meaningless [62,63], these discussions have been challenged due to a lack of solid scientific basis [57]. Before genetic and environmental determinants of facial characteristics are fully identified, ethnicity/race as a cruder surrogate factor to investigate facial variations remains a useful approach [57].
Our analysis of angular measurements revealed significant inter-ethnic/racial variations for nasofrontal angle, nasolabial angle and nasofacial angle. Nasofrontal and nasofacial angle are both affected by the position of nasion and nasal tip protrusion [12,64]. The smaller nasofrontal angle in African males compared to Caucasian males (posterior mean difference: 8.1°, 95% CrI: 2.2°-13.5°) and larger nasofacial angle in Africans compared to Asians (7.4°, 0.1°-13.2°) may be a reflection of the more inclined nasal bridge in Africans. Nasolabial angle is a critical determinant of nasal tip aesthetics [65]. The larger estimated nasolabial angle in Caucasian females indicates the prognathic feature of Africans and Asians [66]. As per linear facial measurements, our results suggest that width of the face and height of forehead II are significantly larger in Caucasian females than in Asian females, which are consistent with previous preliminary study [46] and systematic review [58]. While previous studies reported moderate interethnic/racial variations regarding the nose [3,58], the present study failed to identify such differences.
The database established in this study provides normative range of facial measurements. Compared to the existing database [3], our database is more comprehensive in terms of the number of subjects used to derive the normative values and the more comprehensive coverage of facial features. Equipped with knowledge about this normal range, plastic and craniofacial surgeons are better informed in determining the amount of surgical corrections needed for a particular patient taking his/her ethnicity/race into consideration. This brings us closer to the ultimate goal of individualized treatment in plastic surgery. Besides, the database provides critical parameters for the manufacture of respirators and oxygen masks, whose design requires taking the consumers' ethnicities/races into consideration. In addition, the results provide a platform for future genetic, nutritional and environmental studies to identify factors influencing facial morphology. The strengths of this study rest on several aspects. First, the well-established definitions of landmarks and measurements [10,12] were complied, which ensured homogeneity of the measurements. Second, risk of bias was assessed following priori defined criteria (S6 Table) to enhance objectivity in assessment. Third, the Bayesian hierarchical model provides statistical advantages over traditional subgroup analysis in meta-analysis. The frequentist approach to meta-analysis yields 95% confidence intervals that are in fact narrower than the range of values they intended to cover [67]; besides, the no pooling nature of subgroup analysis tends to overestimate the variation among ethnicities/races [68]. Therefore, the frequentist approach to subgroup analysis tends to result in an inflated type I error rate compared to Bayesian hierarchical modelling. The type I error rate can be further increased when subgroups are pairwise compared post-hoc. In contrast, pairwise comparisons in Bayesian approach do not affect the rate of type I error since there is only one posterior distribution regardless of how comparisons are made [18].
There are several limitations in the current study. First, our meta-analysis inherits the limitations of original research. Since not all of the eligible studies were conducted as ethnicity/ race-specific studies, subjects' ethnicity/race had to be classified according to an external classification scheme. While the scheme proposed by Risch and colleagues [57] is well established, possibilities of ethnicity/race misclassification still could not be completely obviated. The issue could be further complicated by the increasing presence of mixed ethnicity/race. We recommend future photogrammetric studies defining subjects' ethnicity/race in a more rigorous way by using methods such as ancestral mapping to facilitate inter-ethnic/racial comparisons. Second, we did not adjust our analyses for age or anthropometric indices such as body weight, height or body mass index since they were reported in none of the eligible studies. Possibilities for residual confounding cannot be excluded from our estimates. Third, there is heterogeneity among the eligible studies in terms of the subjects' posturing, camera-object distance and photographic parameters. Risk of bias of the studies differed and a notably high percentage of African studies (58.3%) were with high risk of bias. We accounted for such heterogeneity by using random effects model in our analysis. However, it should be noted that the use of statistical model in our analyses should not overshadow the importance of a universally adopted photographic set-up. The most detailed descriptions of photogrammetric set-up come from Fernández-Riveiro and colleagues [33,34] and their method has been used by other studies [42]. We recommend its universal usage for future photogrammetric studies. Finally, despite the extensive literature search, there is still scarcity of data for several facial measurements. Estimates derived from a small amount of data may be subject to bias when applied to the population at large. Besides, scarcity of data results in substantial uncertainty in the ethnicity/race-specific estimates as revealed by the wide Bayesian credible intervals. In addition, six measurements were excluded from analysis due to insufficient data. To overcome the challenges of sparse data, we used the Bayesian approach to account for uncertainty in the hierarchical modelling, which proved to be more accurate than the frequentist approach, especially for small sample sizes [68,69]. Generalizability of our findings could be improved by inclusion of more high quality photogrammetric studies.
Our study provides a comprehensive database for various angular and linear facial measurements based on the best available photogrammetric studies. Significant inter-ethnic/racial variations were found for both angular and linear measurements. The results can provide a useful resource to guide research and clinical practice. This study also highlights the need for more high quality photogrammetric studies employing standardized photographic techniques; and preferably from a large randomized sample comprising different ethnic/racial groups.  [70]. The box represents the dependent variable. Triangles with number "1" inside is used to define the intercept term, and the subscript to "1" reflects specific levels of the hierarchical structure. Circles represent unobserved random coefficients. Solid arrows represent regression parameters. Purple, blue and green color represent the first, second and third level of the hierarchy, repsectively. We incorported distribution of random error terms for each level of the hierarchy using dash dot arrow. Unknown parameters and their prior distributions are illustrated in red with dot arrows. (TIF) S1