Joint Testing of Genotypic and Gene-Environment Interaction Identified Novel Association for BMP4 with Non-Syndromic CL/P in an Asian Population Using Data from an International Cleft Consortium

Background Non-syndromic cleft lip with or without cleft palate (NSCL/P) is a common disorder with complex etiology. The Bone Morphogenetic Protein 4 gene (BMP4) has been considered a prime candidate gene with evidence accumulated from animal experimental studies, human linkage studies, as well as candidate gene association studies. The aim of the current study is to test for linkage and association between BMP4 and NSCL/P that could be missed in genome-wide association studies (GWAS) when genotypic (G) main effects alone were considered. Methodology/Principal Findings We performed the analysis considering G and interactions with multiple maternal environmental exposures using additive conditional logistic regression models in 895 Asian and 681 European complete NSCL/P trios. Single nucleotide polymorphisms (SNPs) that passed the quality control criteria among 122 genotyped and 25 imputed single nucleotide variants in and around the gene were used in analysis. Selected maternal environmental exposures during 3 months prior to and through the first trimester of pregnancy included any personal tobacco smoking, any environmental tobacco smoke in home, work place or any nearby places, any alcohol consumption and any use of multivitamin supplements. A novel significant association held for rs7156227 among Asian NSCL/P and non-syndromic cleft lip and palate (NSCLP) trios after Bonferroni correction which was not seen when G main effects alone were considered in either allelic or genotypic transmission disequilibrium tests. Odds ratios for carrying one copy of the minor allele without maternal exposure to any of the four environmental exposures were 0.58 (95%CI = 0.44, 0.75) and 0.54 (95%CI = 0.40, 0.73) for Asian NSCL/P and NSCLP trios, respectively. The Bonferroni P values corrected for the total number of 117 tested SNPs were 0.0051 (asymptotic P = 4.39*10−5) and 0.0065 (asymptotic P = 5.54*10−5), accordingly. In European trios, no significant association was seen for any SNPs after Bonferroni corrections for the total number of 120 tested SNPs. Conclusions/Significance Our findings add evidence from GWAS to support the role of BMP4 in susceptibility to NSCL/P originally identified in linkage and candidate gene association studies.


Introduction
Non-syndromic cleft lip with or without cleft palate (NSCL/P) is one of the most common human congenital malformations [1]. Multiple genes and several common environmental risk factors have been implicated in the etiology. There is also suggestive evidence of gene-environment interaction (GxE) for common maternal exposures plus the possibility of biological interactions between different genes, although the latter remains tentative [2][3][4][5]. Understanding of the genetic components of this common and complex birth defect has been greatly improved through recent genome-wide association studies (GWAS) where an association between IRF6 and NSCL/P, first shown in linkage and candidate gene studies, has been confirmed and associations with markers in a novel region on 8q24 in Europeans [6,7] and two novel genes ABCA4 and MAFB in Asian populations were successfully identified [4,8]. However, most of the susceptibility genes identified in previous linkage and candidate gene association studies have not been replicated in GWAS and confirmation studies for GWAS findings have not always obtained supportive results [9][10][11][12][13]. Recent studies jointly considering genotypic (G) and GxE interaction showed potential to identify associations missed in conventional GWAS analysis considering G main effects alone [14][15][16]. This approach may clarify some of the inconsistencies among existing findings.
The aim of the current study is to test for linkage and association with NSCL/P for markers in and around BMP4 that could be missed in GWAS when genotypic (G) main effects alone were considered where risk was estimated for a combination of exposed and unexposed carriers using 895 Asian and 681 European complete case-parent trios originally ascertained through an international cleft consortium [8,31]. In our joint analysis of G and interactions with exposure to any maternal environmental tobacco smoke (ETS) and any multivitamin supplements (VIT) under an additive conditional logistic regression model, significant novel associations with NSCL/P and nonsyndromic cleft lip and palate (NSCLP) were shown in Asian and Chinese trios for rs7156227 after Bonferroni correction. This significance held after trios with exposure to either any maternal tobacco smoking (SMK) or any alcohol consumption (ALCO-HOL) were dropped. Our findings provide supportive evidence for linkage and association between the BMP4 gene and risk of NSCL/P.

Ethics statement
Study protocols were reviewed and approved by the institutional review boards (IRB) or ethical committees at all participating institutions. Adult participants (including biological parents of all probands and probands old enough to give their own consent/ assent) provided written informed consent for themselves. Parents or guardians of the minor participants provided written informed consent for the child's participation.
Specifically, the approval IRBs or ethical committees include IRBs of local ethical committee in Philippines, KK Women's and

Study design
The current analysis used a case-parent trio design where three matched ''pseudo-controls'' were generated for each observed case given the parents' genotypes and the Mendelian inheritance assumption.

Case-parent trios
Samples for the current study were drawn from an international cleft consortium originally recruited from 13 (7 Asian and 6 European/US) sites with the goal of identifying genes influencing oral clefts in a case-parent trio GWAS. 895 trios of Asian ancestry were recruited mainly from countries/regions in Asia (Philippines, Singapore, Taiwan, Shandong, Wuhan, Chengdu and Korea) although there were a few trios of Asian ancestry who were ascertained from Norway (n = 3) and Maryland (n = 1). Trios of European origin (n = 681) were recruited mainly from European countries (Denmark and Norway) and the US (Iowa, Maryland, Pittsburg and Utah). There were also 3 trios who were of European ancestry and recruited from Singapore. The final determination for racial group was based on principal components analyses (PCA) for 33,078 randomly selected, independent and highly polymorphic single nucleotide polymorphisms (SNPs) on all parents from the 13 sites [8].
Clinical assessment was carried out to check for other birth defects and developmental delays to confirm the diagnosis of NSCL/P status of the cases. Population-based ascertainment was used in Norway and cases from other sites were mainly ascertained through surgical treatment centers.

Maternal exposure assessment
Maternal exposures during the critical peri-conceptional period (3 months prior to and through the first trimester) of pregnancy were obtained from direct interviews of the mothers using similar structured interview questions at each site (the same questionnaires were used in 8 sites: Singapore, Taiwan, Shandong, Wuhan, Chengdu, Korea, Maryland and Utah).
Four maternal exposures were assessed as simple yes/no responses: any personal tobacco smoking (SMK), any environmental tobacco smoke in home, workplace or any other nearby places (ETS), any alcohol consumption (ALCOHOL) and any use of multivitamin supplements (VIT, not limited to folate). Exposure rates varied considerably between Asian and European trios ( Table 1).
Because of the low exposure rates for maternal ALCOHOL (19/883*100% = 2.2%) and SMK (26/895*100% = 2.9%) (Table 1), as well as the high proportion of overlap between exposures to maternal SMK and ETS among Asian trios (21 of 22 mothers who smoked were also exposed to ETS, see Tables S1&S2), our linkage and association analyses for Asian trios were limited to those with information on maternal ETS (n = 786), VIT (n = 854) Table 1. Number of complete NSCL/P trios by race, recruitment site, and maternal exposure to tobacco smoking, environmental tobacco smoke, alcohol consumption and multivitamin supplements during the critical peri-conceptional period of pregnancy from an international cleft consortium.
Race & recruitment site and both ETS and VIT(n = 746, Table 1 and Table S1). For European trios, we tested interactions with each or a combination of the four investigated maternal exposures. The numbers of complete NSCL/P trios with information on maternal SMK, ALCOHOL, ETS, and VIT are 679, 678, 460, and 589, respectively ( Table1 & Table S3). The number of trios that have information on all four exposures is 374 (Table S4).

DNA, genotyping and genotypic imputation
DNA for cases and their biological parents were obtained from a variety of types of biological specimens: whole blood, buccal brush/swab, saliva, mouthwash or dried blood spots.
In addition, 25 imputed SNVs located between the two genotyped SNPs (rs7152946 and rs1001161) which are adjacent to rs7156227 were also used in the current study. Imputation was carried out based on the 1000 genomes reference population using IMPUTE2 [32] by the GENEVA coordinating center [33].
For Asian trios, the numbers of SNPs used in analyses were 117 (110 genotyped plus 7 imputed for those informative for ETS or both ETS and VIT), and 118 (111 genotyped plus 7 imputed for those informative for VIT). 120 (113 genotyped plus 7 imputed) SNPs were used in all analyses for European trios. Specifically, 10 and 3 genotyped SNVs were dropped due to MAF,1% in Asian and European parents, respectively; other genotyped SNVs were dropped due to linkage disequilibrium (LD, as measured by r 2 = 1) with another SNP. All 18 imputed SNVs were dropped due to MAF,1%. The lowest genotyping call rates for Asian and European trios were 99.2% and 98.7%, respectively (data not shown).
(2) Test of linkage and association considering genotypic main effect alone. The genotypic TDT (gTDT) has higher power to test for linkage and association between an observed marker allele and an unobserved causal gene when the G main effect term alone is included in a conditional logistic regression model i.e. [LogitP(case) = b g G] [35]. Under an additive model, genotype G is coded as 0, 1 or 2 to represent the number of minor alleles carried by cases and pseudo-controls [36,37]. The direction and magnitude of association is reported as an estimated odds ratio [(OR) = exp(b g )] for those carrying one copy of the minor allele (in comparison to non-carriers) ignoring interaction with any environmental exposure. The corresponding OR in comparison to non-carriers was estimated as exp(2b g ) for individuals carrying two copies of the minor allele. A 1 degree of freedom (df) Wald test is used to test for statistical significance for G in this model.
Findings for all 895 Asian and 681 European complete NSCL/ P trios using allelic TDT for genotyped markers were published in our previous GWAS report [8], where there was no significant association detected in or near BMP4. In the current analysis, the gTDT was performed among subsets of NSCL/P trios with data on selected maternal environmental factors.
The number of trios contributing to test statistics varied across SNPs and maternal environmental exposures because of differences in genotyping call rates, exposure rates as well as the informativeness of the markers.
(3) Test of linkage and association jointly considering G and GxE interaction. To capture evidence of linkage and association between BMP4 and NSCL/P that may be missed by the allelic or genotypic TDT, the additive conditional logistic regression model for gTDT was extended to include additional GxE terms as Logit[P(case)] = b g G+g[b (g*e_i) G*E i ] to jointly consider the effects of a genetic marker and interaction(s) with maternal environmental exposure(s). In this model, E i represents a dummy variable denoting status for selected maternal environmental exposure [35,38]. Linkage and association signals can be captured by the likelihood ratio test (LRT) comparing a full model containing both G and GxE terms to a null model without containing any terms. The degrees of freedom for the LRT can be computed by the number of parameters in the larger model minus that in the smaller, nested model. When E i is coded as 0 and 1 for the unexposed and exposed respectively, exp(b g ) represents the OR for being a case carrying one copy of the minor allele (compared to non-carriers) for unexposed trios to any environmental factors considered in the GxE interaction terms. While exp[b g +gb (g*e_i) ] estimates the OR of being a case carrying one copy of the minor allele for trios with any or any combination of the maternal environmental exposures considered in the GxE interaction terms. A 1df Wald test can be used to test for statistical significance for each term in the model. Relevant ORs for carrying two copies of the minor allele can be estimated as exp(2b g ) and exp[2(b g +gb (g*e_i) )], respectively. The corresponding 95% confidence intervals (CIs) for these ORs can be estimated considering the associated standard errors.
Because maternal exposure status is the same between the observed case and all three pseudo-controls, tests for the independent effects of any maternal environmental exposure are not possible. Risk for exposed carriers can only be assessed by comparison to exposed non-carriers, while estimation of OR for exposed carriers by comparison to either unexposed carriers or unexposed non-carriers is not possible under a case-parent trio design.
In the Asian trios, interactions with maternal ETS and VIT (in addition to G) were first individually tested and then simultaneously considered using additive conditional logistic regression models. Two and 3 df LRTs were used to identify significant linkage and associations accordingly. ORs for being a case carrying one copy of the minor allele were estimated for trios with various exposure statuses for ETS and VIT (exposed to both ETS and VIT, ETS only, VIT only, neither ETS nor VIT). To avoid the potential influence of ALCOHOL and SMK exposures on these tests, analyses simultaneously considering interactions with ETS and VIT were repeated for the most significant SNP (rs7156227) using trios with no exposure to either ALCOHOL or SMK. These repeated analysis were performed among all Asian combined NSCL/P trios (n = 704), Chinese only NSCL/P trios (n = 609), Asian and Chinese non-syndromic cleft lip only (NSCLO) and NSCLP trios, as well as NSCL/P trios from the Table 2. Nominally significant associations for NSCL/P with SNPs in and around BMP4 jointly considering G and interaction with ETS using conditional logistic regression models in 786 complete Asian trios with data on ETS. two largest groups (Taiwan and Shandong). For European trios, interactions with each of the four investigated maternal environmental exposures were all considered individually and simultaneously (a 5df LRT). ORs were estimated for the two largest groups of European trios: trios without exposure to any of the four environmental factors (n = 64) and trios had exposure to VIT only (n = 98).
Statistical analyses considering G and one GxE terms were performed using TRIO Package in R (version 3.0.0) [35], available at http://www.bioconductor.org. Analyses considering G and more than one GxE terms were carried out using SPSS software package version 20.0 (IBM SPSS Statistics).
Statistical power for relevant tests was estimated using QUANTO software with Bonferroni corrected significance level (a = 0.05/117 (for trios informative for ETS and combined ETS&VIT) and 0.05/118 (for trios informative for VIT) for Asian and a = 0.05/120 for European trios) [39] (http://hydra.usc.edu/ gxe/). Other parameters required for power estimation were obtained either from the parents' genotypic data (number of trios, MAF and maternal environmental exposure rates) or from the fitted conditional logistic regression models (ORs for corresponding G and GxE terms).

Test of linkage and association considering genotypic main effect alone
The tests of linkage and association using gTDT considering the G main effects alone were performed among trios with information on selected maternal environmental exposures. Nominal significance (P,0.05) was seen for 8 SNPs in Asian (Table 2 and  Table S5) and another 2 SNPs among European trios (Tables S6  to S9). The most significant SNPs identified in Asian (rs7156227) and European (rs11157980) trios were in 39 of BMP4 with OR = 0.74 (95%CI = 0.62, 0.88) ( Table 2) and 1.32 (95%CI = 1.07, 1.63) (Table S7), respectively, among Asian trios with information on ETS and European trios with information on VIT. The corresponding Bonferroni corrected P values were 0.083 (asymptotic P = 7.1*10 24 ) ( Table 2) and 1 (asymptotic P = 1.08*10 22 ) (Table S7), respectively.

Test of linkage and association jointly considering G and GxE interaction
(1) Asian trios. When G and interaction with ETS and VIT were considered respectively, 20 of 110 genotyped SNPs around BMP4 showed nominal significance of linkage and association with NSCL/P ( Table 2 and Table S5). Significance held for one SNP (rs7156227) after Bonferroni correction when G and interaction with ETS were considered and a Bonferroni corrected P = 0.021 (asymptotic P = 1.78*10 24 ) for the 2df LRT. The estimated OR for being a case carrying one copy of the minor allele (MAF = 0.20) at rs7156227 without maternal exposure to ETS was 0.61 (95%CI = 0.49, 0.78) compared to unexposed noncarriers. The Bonferroni corrected P value from the Wald test was 0.0053 (asymptotic P = 4.52*10 25 ) ( Table 2).
When G and interactions with ETS and VIT were considered simultaneously using an additive conditional logistic regression model, significant linkage and association with NSCL/P remained for rs7156227 after Bonferroni correction, and another 14 SNPs also showed nominal significance. The Bonferroni corrected P value for the 3df LRT was 0.048 (asymptotic P = 4.13*10 24 ) for rs7156227 when comparing the full model with the null. The OR associated with NSCL/P for rs7156227 was 0.58 (95%CI = 0.45, 0.76) for Asian trios without maternal exposure to either ETS or VIT when compared to unexposed non-carriers with a corrected P = 0.0052 (asymptotic P = 4.47*10 25 ) (Table S10 and Table 3). For trios with other exposure statuses (exposed on both ETS and VIT, exposed on ETS only, and exposed on VIT only), no significance was observed for rs7156227 where statistical power to detect linkage and association was limited (power was only 0.1%, 0.1% and 1.0%, respectively) when the Bonferroni corrected significance level (0.05/117) was used (Table 3).
Significant linkage and association with NSCL/P and NSCLP held for rs7156227 among Asian and Chinese only trios after trios with maternal exposure to ALCOHOL and SMK were dropped ( Table 4). The Bonferroni corrected P value for the 3df LRT was 0.041 (asymptotic P = 3.5*10 24 ) for Chinese only NSCLP trios. ORs for carrying one copy of the minor allele and without maternal exposure to any of the four environmental factors were 0.58 (95%CI = 0.44, 0.75) and 0.57 (95%CI = 0.43, 0.75) for Asian and Chinese only NSCL/P trios, respectively. Relevant ORs for Asian and Chinese NSCLP trios were 0.54 (95%CI = 0.40, 0.73) and 0.52 (95%CI = 0.37, 0.71), respectively. The corresponding Bonferroni corrected P values ranged from 0.0051 to 0.0087 (asymptotic P values ranged from 4.39*10 25 to 7.42*10 25 ). No evidence of significant linkage or association was seen for rs7156227 among all Asian or Chinese only NSCLO trios, or for NSCL/P trios originally ascertained from Taiwan and Shandong without exposure to any of the four environmental factors after Bonferroni correction. The corresponding statistical power was 0.9%, 0.4%, 4.8% and 28.0%, respectively (when the Bonferroni corrected significance level (0.05/117) was used) ( Table 4).
Using imputed data, one SNP (rs6572915) 898 bp upstream of rs7156227 also showed significant association with NSCL/P and Table 3. Linkage and associations with NSCL/P for rs7156227 near BMP4 jointly considering G and interactions with ETS and VIT using conditional logistic regression models in 745 complete Asian trios with data on both ETS and VIT.  Table 4. Linkage and associations with NSCLO, NSCLP and NSCL/P for rs7156227 near BMP4 jointly considering G and interactions with ETS and VIT in Asian trios after those exposed to SMK and ALCOHOL were dropped. NSCLP after Bonferroni correction. However, rs6572915 is almost in complete LD (r 2 = 0.996) with rs7156227.
(2) European trios. In the analysis of European trios considering G and interaction with four maternal environmental exposures individually using conditional logistic regression, 19 SNPs showed nominally significant associations with NSCL/P. The OR for carrying one copy of the minor allele and without exposure to VIT was 0.42 (95%CI = 0.23, 0.79) (asymptotic P = 7.18*10 23 , Bonferroni corrected P = 0.86) for the most significant SNP (rs210359) (Table S7).
When interactions with all four maternal exposures were considered simultaneously in a single model, no association was seen for rs7156227 (MAF = 28.9%, OR 1.34 (95%CI = 0.86, 2.06), asymptotic P = 0.19) among trios without exposure to any of the four exposures. Two other SNPs (rs210327 and rs1380131) showed nominal significance in the 5df LRT, and another SNP (rs210361) showed nominal significant evidence among 64 trios without maternal exposure to any of the four environmental exposures. The 5df LRT gave P = 0.019 for the most significant SNP (rs1380131), but this was not significant after Bonferroni correction (Table S11). No SNP showed any significant association with NSCL/P among 98 trios exposed to VIT only when compared to exposed non-carriers after Bonferroni correction (data not shown).
No nominally significant association was seen for any of the 7 imputed SNPs around rs7156227 in joint analysis of G and interactions with all four environmental exposures in one model for European trios (data not shown).
In the current study, we tested linkage and association using a strategy that jointly considers G and GxE interaction in an additive conditional logistic regression model. Using this approach, we identified a novel significant association with NSCL/P (and NSCLP) for SNP rs7156227 near BMP4 among Asian trios if the mother reported not having been exposed to any of the four environmental factors including ETS, VIT, ALCOHOL and SMK during the critical peri-conceptional period of pregnancy. This significant association was not seen in a previous GWAS search considering genetic main effects alone when genotypic risks were estimated for the combined exposed and unexposed carriers [8]. Our finding adds evidence from GWAS to support the role of BMP4 in susceptibility to NSCL/P originally identified in candidate gene association studies. This is also the first report of association between a marker near BMP4 and the NSCLP subgroup. As with the majority of findings from genetic association studies, the most significantly associated common SNP, rs7156227, is located in the 39 noncoding region of BMP4. Although different SNPs have been previously associated with NSCL/P, linkage disequilibrium could account for this signal with some nearby unknown causal mutation(s). As with some previously published studies, there was no significant linkage and association identified in European trios after Bonferroni correction [25,40] which may indicate insufficient power to detect association and/or etiological heterogeneity between Asian and European populations.
Considering the public health intervention potential, attention has been given to modifiable environmental risk factors, especially SMK, and ETS which may well interact with susceptibility genes for complex diseases [5,41,42]. In the current study, there was no evidence of linkage and association identified for gene-environment interaction between BMP4 and NSCL/P either among all Asian or European trios when exposed carriers to selected environmental exposures were compared to corresponding exposed non-carriers under an additive conditional logistic regression model.
Because the analytical approach adopted in the current study allows separate estimation of risks for exposed and unexposed carriers, respectively, genetic and GxE interaction effects are potential to be identified that could be missed in analysis that considers genotypic main effects alone when risks for exposed and unexposed carriers are in opposite directions, or exist only in one of the exposure groups.
Also, because these analyses were performed in smaller groups compared to those with genotypic main effects considered alone, statistical power became a more important issue, especially when the Bonferroni corrected significance level was considered. For example, statistical power to test for linkage and association can be as low as 0.9% and 0.4% for Asian and Chinese only NSCLO trios without maternal exposure to any of the four factors. The statistical power was even lower for some other exposure groups due to smaller size and lower exposure rates. Larger numbers of case-parent trios will be needed to better answer questions about the effect of genotypic, especially gene-environment interactions in controlling risk of NSCL/P. In addition, because the case-parent trios design is naturally matched for parental exposures, the independent effects of maternal environmental exposures, as well as the interaction effects for exposed carriers in comparison to unexposed carriers or unexposed non-carriers, cannot be estimated. Other study designs, such as case control, may be better suited to answer specific questions about environmental factors alone [43] and gene-environment interaction [44].
Our study showed analyses jointly considering G and multiple GxE interactions can identify important genes through linkage and association tests that analyses considering G effects alone would miss. This analytical approach has provided supportive evidence from GWAS for BMP4 as an important gene for NSCL/P and NSCLP in Asian trios.

Supporting Information
Table S1 Maternal exposure to tobacco smoking, environmental tobacco smoke, multivitamin supplements and alcohol consumption in NSCL/P probands from 895 complete Asian trios. (DOC)     Table S10 Nominally significant associations with NSCL/P for SNPs in and around BMP4 jointly considering G and interactions with maternal ETS and VIT using conditional logistic regression models in 746 complete Asian trios informative for ETS and VIT. (DOC) Table S11 Nominally significant associations for NSCL/P with SNPs in and around BMP4 jointly considering G and interactions with maternal SMK, ETS, ALCOHOL and VIT using conditional logistic regression models in 374 complete European trios informative for all four exposures. (DOC)