Gene-based genome-wide association studies and meta-analyses of conotruncal heart defects

Conotruncal heart defects (CTDs) are among the most common and severe groups of congenital heart defects. Despite evidence of an inherited genetic contribution to CTDs, little is known about the specific genes that contribute to the development of CTDs. We performed gene-based genome-wide analyses using microarray-genotyped and imputed common and rare variants data from two large studies of CTDs in the United States. We performed two case-parent trio analyses (N = 640 and 317 trios), using an extension of the family-based multi-marker association test, and two case-control analyses (N = 482 and 406 patients and comparable numbers of controls), using a sequence kernel association test. We also undertook two meta-analyses to combine the results from the analyses that used the same approach (i.e. family-based or case-control). To our knowledge, these analyses are the first reported gene-based, genome-wide association studies of CTDs. Based on our findings, we propose eight CTD candidate genes (ARF5, EIF4E, KPNA1, MAP4K3, MBNL1, NCAPG, NDFUS1 and PSMG3). Four of these genes (ARF5, KPNA1, NDUFS1 and PSMG3) have not been previously associated with normal or abnormal heart development. In addition, our analyses provide additional evidence that genes involved in chromatin-modification and in ribonucleic acid splicing are associated with congenital heart defects.


Introduction
Congenital heart defects (CHDs) are the most common group of birth defects, with a prevalence of approximately 1% in live births [1]. CHDs are also the leading cause of birth defect related mortality [2] and account for the largest percentage of birth defect associated hospitalizations and hospitalization-associated costs [3]. In the United States, it is estimated that there are approximately 2.4 million CHD survivors (1.4 million adults, 1 million children) [4], the majority of whom will require lifelong cardiac care. Despite the impact on affected patients, their families, and the healthcare system, the causes of CHDs are not well defined [5].
There are many different CHD phenotypes, of which approximately one-third involve the cardiac outflow tracts and great arteries [6]-structures that develop from the cardiac neural crest and secondary heart field [7]. This subgroup of CHDs, collectively referred to as conotruncal heart defects (CTDs), includes some of the most severe and costly birth defects [3,8]. In addition to their shared embryologic and anatomic basis, there is substantial evidence that the various CTD phenotypes (e.g. tetralogy of Fallot (TOF), truncus arteriosus) share common genetic underpinnings. For example, nationwide, population-based studies conducted in Norway and Denmark indicate that CTDs aggregate within families (recurrence risk ratios for CTDs in first-degree relatives: 9-12) [9,10] and that affected relatives of patients with a CTD are at a higher relative risk for CTDs (sibling CTD recurrence risk ratio: 9.0, 95% confidence interval (CI) 4.0-20.0) than for other types of CHDs (sibling non-CTD, CHD recurrence risk ratio: 3.6, 95% CI 2.4-5.5) [9]. Further, there is evidence that, within affected relative-pairs, the specific type of CTD can differ. For example, among 28 CTD-affected siblings of patients with TOF, 17 also had TOF whereas 11 had a different CTD phenotype [11]. Additional evidence that the various CTD phenotypes share a common genetic basis is provided by the phenotypic characteristics of defined genetic syndromes. For example, in patients with the 22q11.2 deletion syndrome, the most common cardiac defects are CTDs, but the specific CTD phenotype (e.g. TOF, interrupted aortic arc) varies across patients with this deletion [12].
Studies of syndromes that include CTDs, such as the 22q11 deletion syndrome, have provided some clues regarding the specific genes that may be involved in determining the risk of CTDs (e.g. TBX1 [13]). In addition, studies of rare, presumably pathogenic, copy number variants [14][15][16], and inherited [17] and de novo [17,18] single nucleotide variants have identified genes that may contribute to the risk of CTDs [18,19]. Yet, most affected patients do not carry a confirmed or suspected rare, causative variant. Moreover, rare variants, in particular rare de novo variants, do not account for the observed increase in risk of CTDs among the relatives of affected patients.
Since rare, pathogenic variants are unlikely to fully account for the population prevalence or familial recurrence of CTDs, additional genetic mechanisms must also contribute to disease risk. While the involvement of more common variants that have more moderate impacts on CTD risk seems likely, genome-wide association studies (GWAS) [20][21][22][23] of common single nucleotide polymorphisms (SNPs) have identified only two genome-wide significant associations for CTDs (rs11065987, p = 7.7E-11 and rs7982677, p = 3.03E-11) [22]. However, given the huge number of variants evaluated in GWAS, the threshold for statistical significance is quite stringent (i.e. p<5E-08). Consequently, the lack of significant findings for CTDs may well reflect low study power rather than the lack of common, CTD-related genetic variants.
Gene-based GWAS provide an additional strategy for identifying disease-related genes, but, to our knowledge, there are no published gene-based GWAS for CTDs. Compared to SNPbased GWAS, gene-based studies have the advantage of a less stringent threshold for statistical significance (e.g. Bonferroni corrected p-value for 20,000 genes, 2.5E-06). In addition, SNPbased analyses generally exclude rare variants, due to low statistical power [24], whereas gene-PCGC (including the PCGC samples collected at CHOP) were genotyped on the Illumina HumanOmni-1 or HumanOmni-2.5 platforms. Additional details regarding the CHOP and PCGC samples are provided elsewhere [18,21,27,28].

Imputation and quality control (QC) procedures
The microarray genotyped data from CHOP and PCGC were imputed using Impute2 v2.3.0 and pre-phased haplotype data obtained from the 1000 Genomes Project (Phase-I integrated v3 variants set) as the reference [30]. Due to differences in the genotyping platforms, the CHOP and PCGC cohorts were imputed separately.
Standard QC procedures were performed for each dataset using PLINKv1.07 before and after imputation [31]. Before imputation, the array data were checked for strand and coding errors. Trios were removed if more than 1% of genotyped SNPs had Mendelian errors. Suspected duplicate samples were identified using pairwise identify-by-descent estimation and samples with pi-hat greater than 0.6 were removed. Samples with genotyping rates less than 95% were also removed. In addition, variants with minor allele frequency (MAF) less than 1%, genotyping rates less than 90%, or deviation from Hardy Weinberg Equilibrium (HWE) in controls (p<1E-05) were excluded, as were all non-autosomal variants.
After the pre-imputation exclusions, the CHOP data from different platforms (HumanOmni-2.5, HumanHap550K v2, 550K v3 and 610K) were combined and only those variants present on all platforms (N = 283,977 SNPs) were used for imputation. Similarly, the PCGC data from different Illumina platforms (HumanOmni-1 and HumanOmni-2.5) were combined and only those SNPs present on both platforms (N = 624,419 SNPs) were used for imputation. For each dataset, haplotypes were pre-phased using SHAPEIT2 v2.727 [32] and imputation was performed using Impute2 v2.3.0 [30]. A genotype was imputed only if the posterior probability value exceeded 0.9, the default calling threshold for Impute2. After imputation, we excluded variants with poor imputation quality (Impute2 information metric score <0.8), or genotyping rates less than 90%. Samples with genotyping rates less than 95% and all insertions or deletions were removed. For all casecontrol comparisons, variants were evaluated for deviation from HWE in the pediatric control group using the exact test [33] implemented in PLINK, and variants with p<1E-05 were excluded. Because we were interested in assessing both rare and common variants, the post-imputation QC procedures did not include restrictions based on MAFs.

Statistical analysis
Genome-wide gene-based analyses were conducted, as described below. Because the various gene-based approaches have different underlying assumptions, strengths and limitations, we used two different gene-based approaches, eFBAT-MM and SKAT-C, to optimize the probability of identifying CTD-related genes. All analyses included all autosomal RefSeq genes, defined by the transcription start-stop coordinates (Genome Reference Consortium Human genome build 37 or hg19 reference assembly) in the RefSeq gene records and we included variants that were 1kb upstream or downstream of each gene.
Family-based analyses. Data for case-parent trios ascertained through CHOP (CHOP-Trios) and PCGC (PCGC-Trios) trios were analyzed separately using an extension of the family-based multi-marker association test (eFBAT-MM) [34]. This test (i.e. eFBAT-MM) is a burden-type approach that collapses variant-level statistics over a gene or region to obtain a single p-value and makes the assumption that all associated variants in the gene or genetic region affect the phenotype in the same direction. The variants were weighted by the inverse of the MAF estimated from the parental genotypes. Meta-analysis of the gene p-values from the CHOP and PCGC trios was performed using Fisher's combination of probability method [35].

Case-control analyses.
For the present study, we formed two independent, case-control (CC) datasets using the microarray genotyped and imputed data from CHOP. The first dataset included the Caucasian subset of patients from the CHOP trios and an equal number of Caucasian pediatric controls (CHOP-CC1). The second dataset included a second set of Caucasian patients with a CTD and an equal number of Caucasian pediatric controls (CHOP-CC2). There was no overlap in the cases or the controls included in CHOP-CC1 and CHOP-CC2.
The two CHOP case-control datasets (CHOP-CC1 and CHOP-CC2) were analyzed separately using the sequence kernel association test for the combined effect of common and rare variants (SKAT-C) [36]. Using this approach, separate scores were calculated for rare and common SNPs and these scores were combined as a weighted sum to calculate the gene pvalue. The SKAT-C recommended default parameters were used for variant weighting and analysis. To control for population stratification bias, only non-Hispanic Caucasian cases (based on self-or parental report) were included in the analyses. Since race and ethnicity were based on self-report (rather than ancestry informative genetic markers), each analysis was also adjusted for the first genotypic principal component. Genotypic principal component analyses were conducted in Golden Helix SVS8.1, using the default parameter settings (MAF-based allele classification, additive genetic model and data for each marker were normalized by its theoretical standard deviation under HWE) (Golden Helix, Inc., Bozeman, MT, www. goldenhelix.com). Meta-analysis of the gene p-values from the two case-control series was performed using Fisher's combination of probability method [35]. Meta-analyses combining results from the eFBAT-MM and SKAT-C analyses were not performed, given the overlap in patients (i.e. the non-Hispanic Caucasian cases in the CHOP-Trios are the case group for CHOP-CC1) and the differences in the assumptions underlying the two analytic approaches.
For each of the family-based and case-control analyses, the genomic inflation factor (λ) was calculated (for the case-control analyses, λwas calculated using values that adjusted for the first genotypic principal component) and a quantile-quantile (Q-Q) plot was constructed to check for deviation of the genome-wide observed distribution of the test statistic from the expected null distribution. Genes with association p-values less than the Bonferroni-corrected p-value (based on the number of genes in each analysis) were considered genome-wide significant. Genes with p-values greater than the Bonferroni-corrected p-values but less than 1E-03 were considered to be suggestive of an association.
Gene-set enrichment analysis. Genes with p<0.01 in the eFBAT-MM or SKAT-C metaanalyses were evaluated together for gene-annotation enrichment using MetaCore TM (Thomson Reuters, Life Science Research, https://portal.genego.com/metacore). A false-discovery rate (FDR) corrected p-value less than 0.05 was used to identify significant pathway maps and Gene Ontology (GO) processes. REVIGO was used for clustering GO terms based on p-values and semantic similarity score (simRel) [37]. The simRel scores range from 0 to 1 and we used a score threshold of 0.4 for filtering GO terms.
Gene annotation and prioritization. To prioritize genes with at least suggestive evidence of association with CTD (p<1E-3, in either the family-based or case-control meta-analysis), for future investigations, we considered: (1) whether the meta-analysis p-value for the gene was lower than the p-values in contributing datasets i.e. the evidence for association was stronger in the combined data than in either of the individual datasets; and, (2) gene expression levels, based on heart expression data from E9.5 and E14.5 mouse embryos [18]. For each gene with a meta-analysis p-value lower than the p-values for the contributing datasets, we annotated the variants that were included in our analyses, for location, function, MAF in the genome aggregation database [38], Combined Annotation Dependent Depletion (CADD) phred-scaled scores [39], Genome-Wide Annotation of VAriants (GWAVA) [40] and Genomic Evolutionary Rate Profiling scores [41]. Genes with meta-analysis p-values lower than the p-values from the contributing datasets, and with heart expression data in the top quartile at E9.5 or E14.5 were considered strong candidates for future investigations.

Results
After QC exclusions, there were 640 CHOP trios and 317 PCGC trios for family-based analyses (Fig 1). In addition, there were 482 patients with CTD and 483 controls for CHOP-CC1, and 406 patients with CTD and 406 controls for CHOP-CC2. In both sets of trios, patients were predominantly Caucasian ( Table 1). The two case-control datasets were restricted to Caucasians participants. In all groups, the most common heart defect was TOF.

Gene-based GWAS of individual datasets
The number of variants and genes included in each analysis are summarized in Table 2. The genotype concordance for the imputation was >90%. The Q-Q plots (S1-S4 Figs) and genomic inflation factors ( Table 2) provided little evidence for systematic bias in the observed pvalues. No genome-wide significant associations were identified in the analyses of the individual datasets. The number of genes with suggestive evidence of association (p<1E-03) ranged from 13 to 27 (Table 2). There was no overlap across datasets or analyses with respect to the genes with suggestive evidence of association. Detailed genome-wide results for each analysis are included in Tables B-E in S1 File.

Meta-analysis
Separate meta-analyses were conducted using eFBAT-MM p-values for 21,170 genes that were analyzed in both the CHOP and PCGC trios, and from SKAT-C p-values for 21,077 genes that were analyzed in both the CHOP-CC1 and CHOP-CC2 case-control studies. The Q-Q plots (Figs 2 and 3) and genomic inflation factors provided little evidence for systematic bias in the observed p-values. No gene achieved genome-wide significance in either meta-analysis (Table F in S1 File provides p-values for all genes assessed in each meta-analysis). Suggestive evidence of association (p<1E-03) was obtained for 11 genes (8 protein coding, 2 pseudogenes, 1 RNA gene) in the trio-based meta-analysis (Table 3) and for 27 genes (23 protein coding, 4 RNA genes) in the case-control based meta-analysis (Table 4).

Gene-set enrichment analysis
MetaCore TM gene-set enrichment analysis was performed using genes with p<0.01 in the trio (195 genes) or case-control (246 genes) based meta-analyses (Table F in S1 File). We identified two significantly enriched pathways (FDR-corrected p<0.05): dynein-dynactin motor complex in axonal transport in neurons (FDR-corrected p = 0.02), and chromosome condensation in prometaphase (FDR-corrected p = 0.02) (Table G in S1 File). In addition, we identified 111 significantly enriched (FDR-corrected p<0.05) non-redundant (REVIGO-clustered) GO processes (Table H in S1 File). GO term clusters included processes relevant to heart defects including cellular response to hormone stimulus, angiogenesis and biological adhesion.

Gene annotations and prioritization
Of the 38 genes with suggestive evidence of association (p<1E-3) in either the family-based or case-control meta-analysis, 19 protein-coding and 2 RNA genes had a lower meta-analysis pvalue than the p-values in contributing datasets i.e. the evidence for association was stronger in the combined data than in either individual dataset (Table 5). For the 21 genes with metaanalysis p-value less than the individual study p-values, the majority (95%) of variants included in the analyses were intronic (Table I in S1 File). Heart expression data from E9.5 and E14.5 mouse embryos [18] were available for 15 of the 19 protein-coding genes, of which eight (53%) were in the top quartile of expression at one or both time points (Table 5). We propose these eight genes (ARF5, EIF4E, KPNA1, MAP4K3, MBNL1, NCAPG, NDFUS1, PSMG3) as CTD candidate genes.

Discussion
Our comprehensive genome-wide, gene-based analysis of common and rare variants identified 38 genes with suggestive evidence of association (meta-p<1E-3) with CTDs, as well as relevant biological pathways and processes that were significantly enriched (FDR-corrected p<0.05) among the genes with the most significant p-values in gene-based analyses. Based on both statistical evidence (i.e. the evidence for association was stronger in the meta-analysis than in any of the contributing studies) and gene expression data (top quartile of expression in mouse heart at E9.5 or E14.5) we propose eight genes (ARF5, EIF4E, KPNA1, MAP4K3, MBNL1, NCAPG, NDFUS1, PSMG3) as CTD candidate genes. Four of the CTD candidate genes suggested by our work have not been associated with normal or abnormal heart development. These four genes are: ADP ribosylation factor 5 (ARF5), which encodes a GTP-binding protein involved in protein trafficking; karyopherin subunit alpha 1 (KPNA1), which functions in nuclear protein import; NADH:Ubiquinone oxidoreductase core subunit protein coding S1 (NDUFS1), which encodes the core subunit of the mitochondrial membrane respiratory chain NADH dehydrogenase, and; proteasome assembly chaperone 3 (PSMG3), which encodes a chaperone protein.
The known function of the remaining four candidate genes suggests that their altered expression could cause CHDs. Of these genes, the most significant association was observed for muscleblind-like splicing regulator 1 (MBNL1, eFBAT-MM meta-p = 1.5E-04). This gene encodes a CH3-type zinc finger protein (MBNL1) that is a key regulator of pre-RNA alternative splicing. Evidence that splicing regulators contribute to the etiology of CHDs is provided by the identification of a genome-wide, significant excess of damaging de novo and loss-offunction heterozygous mutations in another key splicing regulator, RBFOX2, in patients with CHDs [17].
Several additional lines of evidence also support a role for MBNL1 in cell differentiation and heart development. For example, MBNL1 and RBFOX2 appear to co-regulate the splicing changes that lead to the differentiation of pluripotent stem cells [42]. In addition, in the nucleotide repeat expansion disorder, myotonic dystrophy, reduced MBNL1 splicing activity (due to binding of MBNL1 protein to the expansion RNA) is thought to play a major role in determining the disease phenotype, which includes several cardiovascular abnormalities (conduction defects, arrhythmias, mitral valve prolapse) [43,44]. There is also evidence that MBNL1 is involved in the fetal to adult transition in alternative splicing patterns in the heart [45], and that MBNL1 negatively regulates TGF-β signaling and the epithelial-mesenchymal transition in the endocardial cushions by restricting the timing and amount of TGFβ production in the atrioventricular canal and outflow tract endocardium [46,47]. Mice null for MBNL1 protein present with abnormal heart valve development, regurgitation across both the in-and outflow valves, and ostium secundum septal defects [47].
Further evidence that genes involved in RNA splicing may be associated with CTDs is provided by our gene-set enrichment analyses. Specifically, genes mapping to the GO process 'regulation of RNA splicing' (GO:0043484) were significantly enriched (FDR-adjusted p = 0.03) among genes with association p<0.01 in our meta-analyses. In addition to MBNL1, seven genes mapping to this process (CLK3, DDX5, JMJD6, SRSF2, SRSF9 and TMBIM6) had metaanalysis p<0.01 (meta-analysis p-value range: 2E-03 to 8E-03) in either the family-based (i.e. eFBAT-MM) or case-control (i.e. SKAT-C) meta-analysis.
The second most significant of our proposed CTD candidate genes was eukaryotic translation initiation factor 4D (EIF4E, SKAT meta-p = 3.6E-04). The encoded protein, eIF4F, directs ribosomes to the mRNA 5'-cap and is a key factor in initiation of translation of many mRNAs [48]. Zhang et al. have presented evidence that eIF4E is involved in heart development via the p53-Rbm24 loop [49]. Specifically, they demonstrated that the multifunctional RNA-binding Table 3. Genes with suggestive evidence of association (p<1E-03) in the trio-based meta-analysis. protein, Rbm24, prevents binding of eIF4E to p53 RNA, thereby repressing p53 translation and p53-dependent apoptosis. Further, they showed that mice deficient for Rbm24 develop endocardial cushion defects as a result of aberrant binding of eIF4E to p53 RNA resulting in overexpression of p53. Mutations in EIF4E have also been implicated as a cause of autism in humans [50], and enhanced eIF4E activity has been associated with autism-like phenotypes in animal models [51]. Hence, our finding adds EIF4E to the growing list of genes that may be related to both CHDs and neurodevelopment disabilities such as autism [17,52]. Our study also identified the mitogen-activated protein kinase kinase kinase kinase 3 (MAP4K3) as a CTD candidate gene. The product of this gene is an upstream activator of the c-Jun-N-terminal kinase (JNK) signal transduction pathway, which is involved in several processes relevant to heart development (e.g. cell growth, differentiation and survival, apoptosis) [53]. Downstream effectors of JNK signaling relevant to heart development include the tumor suppressor/apoptosis gene, p53 (discussed above), and SMAD4. In a mouse model, disruption of Smad4 in neural crest cells resulted in multiple malformations including defects of the outflow tracts and ventricles [54]. Further, in humans, SMAD4 gain of function mutations cause Table 4. Genes with suggestive evidence of association (p<1E-03) in the case-control based meta-analysis. Gene-based GWAS and meta-analysis of conotruncal heart defects Myhre syndrome, which includes CHD as a common (~2/3rds of patients) phenotypic finding [55]. There is also evidence that MAP4K3 is a central regulator of autophagy, a process that is critical for maintaining the supply of free amino acids for protein synthesis [56] that is required for embryonic growth and development. Finally, our analyses identified non-SMC condensing I complex subunit G (NCAPG, SKAT meta-p = 4.2E-04) as a CTD candidate gene. The protein encoded by this gene forms part of the condensin complex, which is involved in mitotic chromatin condensation [57]. Further, our analyses indicated that genes in the MetaCore pathway map, "Chromosome condensation in prometaphase) were also significantly enriched (FDR-adjusted p = 0.02) among genes with association p<0.01 in our meta-analyses. In addition to NCAPG, a second member of the condensing complex, NCAPH (FBAT meta-p = 9.13E-03), and BAZ1B (FBAT meta-p = 5.62E-03), which is part of the WICH chromatin remodeling complex, had meta-analysis p<0.01. The involvement of chromatin-related genes, particularly H3K4me-H3K27me pathway genes, in CHD etiology has been suggested by studies of de novo mutations [18]. Our findings suggest that other classes of chromatin-modifiers may also contribute to CHDs.

Meta-analysis
We have previously conducted SNP-level (MAF�0.05) GWAS using the same datasets as in the current gene-based analyses [20]. In our meta-analysis of the SNP-level results, we identified 36 variants with suggestive evidence of association (P�1E-5). However, no association was genome-wide significant (P<5E-8). Further, none of the SNPs with suggestive evidence of association were located in, or within 1kb up or downstream of, the genes with suggestive evidence of association (Table 5) in the current, gene-based analyses. However, it should be noted that the SNP-level analyses were restricted to include only common variants and used slightly different configurations of the data. Specifically, in the SNP-level analyses we compared the cases used in CHOP-CC2 to all available CHOP pediatric controls (N = 2,976 controls), and the SNP-level meta-analysis was based on the combined results from CHOP-Trios, PCGC-Trios and the case-control analyses.
To our knowledge, this is the first gene-based genome-wide analysis of CTDs that is based on data for both common and rare variants. Because the various gene-based approaches have different underlying assumptions, strengths and limitations, we used two different gene-based approaches, eFBAT-MM and SKAT-C, to optimize the probability of identifying CTD-related genes. The family-based approach, eFBAT-MM, is robust to population stratification bias, but assumes that all variants within a gene have effects in the same direction and that the effect size is inversely proportional to the MAF. In contrast, the case-control approach, SKAT-C, is subject to stratification bias, but does not make assumptions about the direction of association. Therefore, SKAT-C is more powerful than eFBAT-MM when a large proportion of protective and neutral variants are present in a gene, and the converse is true when this proportion is small. Given the differences between the two methods, the lack of overlap in the genes identified by the two approaches is not particularly surprising.
Although the gene-based approaches used in our analyses had a lower multiple-testing burden than SNP-based GWAS, the criterion for achieving statistical significance (corrected p~2.5E-06) remained quite stringent. This, in combination with our relatively small sample sizes, suggests that associations with true CTD-related genes may have been missed in our analyses due to low study power. Further, the Q-Q plots for the eFBAT-MM analysis of individual datasets (Fig 2) indicate that this test may be too conservative, which would have also negatively impacted our power to detect a true association. Given these considerations, genes with suggestive evidence of association (meta-p<1E-03) and pathways and processes with FDR p<0.05 appear to be strong targets for further investigations of the genetic basis of CTDs.
In our analyses, we combined data across different CTD phenotypes, which could have obscured associations if the etiology of the individual phenotypes is distinct. For example, mutations in laterality genes (e.g. CFC1, FOXH1) have been observed in association with TGA [58,59], suggesting that at least some cases of TGA might be more appropriately classified as laterality defects rather than CTDs. However, the preponderance of evidence suggests that the various CTD phenotypes share common genetic underpinnings. Studies of familial recurrence patterns, phenotypes of patients with known genetic syndromes (e.g. 22q11.2 deletion syndrome) and studies in animal models all indicate that the various CTD phenotypes share genetic risk factors. Moreover, studies of rare de novo and inherited variants in humans provide evidence that the genes involved in CHDs may be shared across even broader categories of defects. For example, Jin et al. reported genome-wide significant excess of damaging de novo and loss-of-function heterozygous mutations in seven genes among 2,871 patients with CHD. Of these seven genes, mutations in six were observed across broad CHD categories (i.e. CTDs, left-sided lesions and/or other CHDs) [17]. Hence, while studies of CTDs as a group might miss phenotype-specific associations, such studies appear to be appropriate for genes that contribute broadly to CHD risk and for genes that influence the spectrum of CTDs.
In summary, our genome-wide, gene-based analyses of common and rare variants identified enriched pathways and biological processes and candidate genes for CTDs. Our findings provide evidence for new CTD-related candidate genes, as well as support and expand on prior evidence implicating chromatin-related genes and splicing-regulators as determinants of CHD risk.

Conclusions
To our knowledge, this is the first study reporting the results of gene-based, genome-wide association studies for CTDs. The results of our study provide evidence for eight CTD candidate genes, of which four have previously been implicated in heart development and four are novel candidates. Thus, these findings add to our understanding of the complex, genetic etiology of CTDs, which may, in turn, enhance our ability to understand, predict and ultimately improve clinical outcomes for this patient population.