Genetic Variants in Isolated Ebstein Anomaly Implicated in Myocardial Development Pathways

Ebstein anomaly (EA) is a rare heart defect in which the tricuspid valve is malformed and displaced. The tricuspid valve abnormalities can lead to backflow of blood from the right ventricle to the right atrium, preventing proper circulation of blood to the lungs. Although the etiology of EA is largely unresolved, increased prevalence of EA in those with a family history of congenital heart disease suggests EA has a genetic component. Copy number variants (CNVs) are a major source of genetic variation and have been implicated in a range of congenital heart defect phenotypes. We performed a systematic, genome-wide search for CNVs in 47 isolated EA cases using genotyping microarrays. In addition, we used a custom HaloPlex panel to sequence three known EA genes and 47 candidate EA genes. We identified 35 candidate CNVs in 24 (51%) EA cases. Rare sequence variants in genes associated with cardiomyopathy were identified in 11 (23%) EA cases. Two CNVs near the transcriptional repressor HEY1, a member of the NOTCH signaling pathway, were identified in three unrelated cases. All other candidate CNVs were each identified in a single case. At least 11 of 35 candidate CNVs include genes involved in myocardial development or function, including multiple genes in the BMP signaling pathway. We identified enrichment of gene sets involved in histone modification and cardiomyocyte differentiation, supporting the involvement of the developing myocardium in the etiology of EA. Gene set enrichment analysis also identified ribosomal RNA processing, a potentially novel pathway of altered cardiac development in EA. Our results suggest an altered myocardial program may contribute to abnormal tricuspid valve development in EA. Future studies should investigate abnormal differentiation of cardiomyocytes as a potential etiological factor in EA.

Ebstein anomaly (EA) is a rare heart defect in which the tricuspid valve is malformed and displaced. The tricuspid valve abnormalities can lead to backflow of blood from the right ventricle to the right atrium, preventing proper circulation of blood to the lungs. Although the etiology of EA is largely unresolved, increased prevalence of EA in those with a family history of congenital heart disease suggests EA has a genetic component. Copy number variants (CNVs) are a major source of genetic variation and have been implicated in a range of congenital heart defect phenotypes. We performed a systematic, genome-wide search for CNVs in 47 isolated EA cases using genotyping microarrays. In addition, we used a custom HaloPlex panel to sequence three known EA genes and 47 candidate EA genes. We identified 35 candidate CNVs in 24 (51%) EA cases. Rare sequence variants in genes associated with cardiomyopathy were identified in 11 (23%) EA cases. Two CNVs near the transcriptional repressor HEY1, a member of the NOTCH signaling pathway, were identified in three unrelated cases. All other candidate CNVs were each identified in a single case. At least 11 of 35 candidate CNVs include genes involved in myocardial development or function, including multiple genes in the BMP signaling pathway. We identified enrichment of gene sets involved in histone modification and cardiomyocyte differentiation, supporting the involvement of the developing myocardium in the etiology of EA. Gene set enrichment analysis also identified ribosomal RNA processing, a potentially novel pathway of altered cardiac development in EA. Our results suggest an altered myocardial program may contribute to abnormal tricuspid valve development in EA. Future

Introduction
Ebstein anomaly (EA) is a rare but serious congenital heart defect (CHD) that was first described in 1866 by Wilhelm Ebstein. EA is characterized by downward displacement of the tricuspid valve resulting from incomplete separation of the tricuspid valve from the underlying myocardium. Right ventricular myocardium abnormalities along with patent foramen ovale, atrial septal defect or right ventricular dilation are present in the majority of EA cases [1]. Although the prevalence of EA was reported in 1958 as 1-5 per 200,000 live births [2], more recent population studies have reported a prevalence of 1 in 13,800-25,600 live births [3][4][5]. In addition to high intrauterine mortality, 15-50% of neonates born with EA do not survive to age 10 years [1,6].

Ethics Statement
The New York State Department of Health Institutional Review Board (NYS DOH IRB) and the National Institutes of Health Office of Human Subjects Research approved this study. Prior to genotyping and analysis, specimens were assigned a random ID number and all personally identifying data were removed. Raw data or genotypes are available upon request with proper IRB approval.

Case Identification
Cases with isolated EA (no other major birth defects) were identified from among all 2,023,083 resident live births in NYS from 1998 through 2005 reported to the NYS Congenital Malformations Registry (CMR). Reporting of selected major birth defects, including EA, identified during the first two years of life is mandated by NYS law. Methods for case ascertainment by the NYS CMR have been described previously [27]. In a query of the NYS CMR database, 117 EA cases were identified using the expanded British Paediatric Association (BPA) code 746.200 and by searching for the text description "Ebstein". EA cases known to have syndromes/chromosomal abnormalities (N = 14), other CHDs (N = 37), other major defects (N = 14) or resulting from plural births (N = 1) were excluded from analysis, leaving 51 eligible EA cases.
Among these 51 cases, those with common minor CHDs including patent ductus arteriosus (N = 11), patent foramen ovale (N = 5), bicuspid aortic valve (N = 1) or other minor birth defects such as syndactyly of the toes (N = 1) were retained for analysis. We attempted to locate archived newborn screening dried blood spots (DBS) for the 51 eligible EA cases along with three unaffected live births to serve as controls. Sufficient material was available for 47 EA cases, and none had been marked that the parents had refused use of the specimens for research. Demographic data for all NYS live births delivered from 1998 through 2005 were collected from birth records and compared with demographic data for the 47 EA cases using the Pearson Chi-square test in SAS v9.2.
Genotyping DNA was extracted from two 3-mm punches of the DBS [28]. A total of 47 EA cases, three unaffected controls, 123 cases with other unrelated phenotypes [29], and one HapMap specimen were genotyped as a single batch. Specimens were genotyped at the Biomedical Genomics Center Core Facility at the University of Minnesota using Illumina HumanOmni2.5-8_v1 bead arrays and the Infinium HD assay protocol. Data were analyzed using Illumina GenomeStudio v2011.1 with a genotype no-call threshold < 0.15.

Autosomal SNPs
Genotype clusters were defined based on the data generated in this project. Genotypes and clusters were manually reviewed and cleaned by re-clustering, editing, and excluding where appropriate [30]. A total of 2,278,660 autosomal single-nucleotide polymorphisms (SNPs) and 57,201 sex chromosome SNPs passed quality control and were included in CNV analysis (total autosomal SNPs on the array = 2,314,174; total sex chromosome SNPs on the array = 58,187). Among the 2,278,660 autosomal SNPs, the mean specimen call rate among the 47 EA cases and three controls was (± standard deviation, (range)) 99.7% ± 0.3 (98.3%-99.9%) and the mean log R ratio deviation was 0.133 ± 0.030 (0.099-0.231). After cleaning, SNP genotype reproducibility (based on two duplicates included among the 173 samples genotyped) was 100%.

Sex Chromosome SNPs
SNPs on the X chromosome were clustered using female subjects only and SNPs on the Y chromosome were clustered using male subjects only. SNPs in pseudoautosomal region (PAR) 1 or 2 or in the X-chromosome-transposed region (XTR) were clustered using both males and females. Clusters were reviewed and edited, as described above for autosomal SNPs.
identify intersecting transcripts and genes. Transcripts included full-length coding transcripts and full-length non-coding transcripts with a well characterized biotype downloaded from GENCODE (version 19, accessed via UCSC genome browser May 2014) [35]. Genes were defined as those included in the Consensus CDS project (CCDS; release 15, accessed via UCSC genome browser June 2014). CNVs were checked for overlap with previously reported EA genes, NKX2-5, GATA4, and MYH7 [10,13]. Sex chromosome CNVs were called with cnvPartition and were manually annotated.

Candidate Gene Identification
Additional EA candidate genes were identified by searching the disease term 'Ebstein Anomaly' in the PhenoDigm database (original query October 2013, same gene list retrieved March 2016), which identifies potential gene-disease associations using phenotype information from a variety of model organisms [36]. Coordinates (hg19) for human orthologs of mouse and zebrafish genes reported by PhenoDigm as associated with EA were obtained using the Ensembl Bio-Mart tool [37] and checked for overlap with all CNVs, as described above.

CNV Selection and Prioritization
Candidate autosomal CNVs were selected if they met the following criteria: at least 10 probes, at least 25 Kb in length and less than 35% overlap with: common HapMap3 copy number polymorphisms, common CNV blocks identified in the CHOP CNV database, controls in this project (of the same CNV type), other cases with unrelated birth defects and controls that we previously genotyped. The UCSC genome browser [38] was used to filter out CNVs that had substantial overlap with multiple Database of Genomic Variants (DGV) [39] entries (of the same CNV type, release date-2014/10/16). Sex chromosome CNVs that were at least 25 Kb in length were manually reviewed for overlap with control subjects, other birth defect cases and CNVs catalogued in DGV.
Log-R ratio (LRR) and B-allele frequency (BAF) plots across each candidate CNV region were manually reviewed to subjectively assess the quality and validity of generated CNV calls using the Illumina Genome Viewer in Genome Studio.

CNV Validation
A subset of the most biologically interesting candidate CNVs were validated in the laboratory using two to three quantitative real-time PCR (qPCR) TaqMan assays (Applied Biosystems, Carlsbad, CA) per region. For detailed information see CNV Validation in S1 Methods.
Sequencing. Mutations in 50 genes were screened using a custom Agilent HaloPlex panel. Genes sequenced include: three genes previously associated with EA (NKX2-5, GATA4, and MYH7), six sarcomere genes, seven genes from candidate EA-associated CNVs identified in this study, 14 highly ranked PhenoDigm [36] genes from regions where one or more EA cases had loss of heterozygosity, nine genes in a critical region for EA identified via linkage study in a canine model [40], six cardiomyopathy genes that are associated with sarcomeric function or LVNC and five genes from a literature review (panel designed using SureDesign, see S6 Table  for complete gene list). Sanger sequencing was used to validate potentially pathogenic variants called in HaloPlex data. For detailed information see Sequencing in S1 Methods and S4 Table. PhenogramViz CNV Ranking To validate our CNV selection criteria, PhenogramViz (v0.1.2), a tool for clinical interpretation of CNVs [41], was used to rank the CNVs of all subjects. PhenogramViz utilizes phenotypes and leverages integrated cross-species phenotype ontology to rank the potential pathogenicity of CNVs an individual carries. 'Ebstein's anomaly of the tricuspid valve' was used as the phenotype term. All PennCNV calls that contained at least 10 probes and were at least 25 Kb in length, all cnvPartition autosomal calls at least 25 Kb in length and all cnvPartition sex chromosome calls at least 25 Kb in length that were not copy-number 2 in PAR or XTR regions were ranked using default parameters.

Gene Set Enrichment Analysis
Three gene lists were generated from GENCODE [35] genes (version 19, accessed via UCSC table browser October 2015) in candidate CNV regions: (1) using all candidate EA-associated CNVs identified in this study; (2) using only candidate EA-associated deletions identified in this study; and (3) using only candidate EA-associated duplications identified in this study. Three known EA genes: MYH7, GATA4 and NKX2-5 were added to each list. Lists were then separately used as input into Enrichr [42] and results were exported. Tables were filtered to exclude: gene sets with adjusted P ! 0.05, gene sets that were enriched with only the three known disease genes, gene sets that were enriched with gene(s) from a single CNV or gene sets that were generated from cell lines not applicable to EA (for example gene sets generated in cancer cell lines). To avoid redundant results, gene sets that were enriched with genes from lists (2) or (3) were excluded if they were also enriched with genes from list (1). After filtering/ excluding enriched gene sets, similarities were calculated and visualized with the Enrichment Map Cytoscape app [43] using an overlap coefficient cutoff of 0.5. Connected subclusters were arranged using a degree-sorted-circle layout, scaled to allow label visualization and manually annotated.

Case Identification
Using BPA code 746.200 and text field descriptions, 117 EA cases were identified in the NYS CMR database from among 2,023,083 NYS live-births, resulting in a birth prevalence of 1 in 17,300. Of the 117 EA cases, 51 were isolated (see methods), resulting in a birth prevalence of 1 in 39,700 live births with isolated EA.
As shown in Table 1, none of the characteristics examined (maternal age, infant sex, maternal race/ethnicity, maternal education, parity, smoking during pregnancy, prenatal maternal body mass index) were statistically different between EA cases and controls.

CNV Identification
Among the 47 EA cases genotyped, cnvPartition called 1,772 CNVs, and PennCNV called 3,982 CNVs. After applying selection filters, 120 autosomal CNVs and 87 sex-chromosome CNVs were called by one or both algorithms. After manual review and exclusion for substantial overlap with DGV entries, 46 CNV calls in 27 EA cases remained. Upon manual review of LRR and BAF plots, six CNVs in one subject were excluded (duplications/deletions not supported by plot inspection). Two CNVs on the Y chromosome of one male EA case were excluded because they were in genomic regions with sparse representation on the array. After all exclusions, 38 candidate CNVs in 26 EA cases fulfilled our selection criteria. Three of the 38 CNVs were recurrent and 35 were each found in a single EA case. Twenty-six EA cases (55%) carried at least one candidate CNV ( Table 2). Seventeen of 20 CNVs selected for validation were confirmed by qPCR, showing the expected copy number for all qPCR assays spanning each region. Two recurrent and one CNV found in a single EA case failed to validate and were considered false positive CNV calls. None of the 17 CNVs that validated were found in 190 control specimens (S3 Table).

PhenogramViz CNV Ranking
PhenogramViz was used to rank genes included in all CNVs carried by each individual, based on phenotype, predicted haploinsufficiency and overlap with known benign or pathogenic CNVs. This analysis was carried out post-hoc, and was used to assess our candidate CNV filtering and prioritization methods. Our selection criteria yielded candidate CNVs that were also highly ranked by PhenogramViz (i.e., they included genes likely to be relevant to EA). Of the 38 candidate CNVs identified, 32 were ranked in the top five most likely pathogenic for EA in that individual (S1 Table). Furthermore, 19 of 38 CNVs had a haploinsufficiency score greater than zero, an indication that the CNV overlaps dosage sensitive genes [44]. Overall, the Pheno-gramViz results suggest that our selection criteria successfully limited our analysis to CNVs predicted to be pathogenic.  Table 3.
Filtering variants in the remaining 47 genes resulted in 36 potentially pathogenic variants in 18 EA cases. Twelve variants in 11 EA cases were validated by Sanger sequencing and are shown in Table 4. Of the 36 variants tested, 24 did not validate by Sanger sequencing; again, the variants that did not validate had low allelic balance (average 0.27). All variants shown in Tables 3 and 4 are rare or absent from the Exome Aggregation Consortium (ExAC) database, which contains variants from more than: 30,000 European individuals, 8,000 South Asian individuals, 5,000 African American individuals, 5,000 Latino individuals and 4,000 East Asian  individuals [45]. In addition, all variants in Table 4 are predicted loss of function, high impact or predicted pathogenic by multiple algorithms (details shown in S3 Fig).
As shown in Table 5, three cases without a candidate CNV carried heterozygous missense MYH7 mutations that could be pathogenic. Case C19 carries a MYH7 variant (p.Arg243His) that is considered pathogenic. This variant was originally reported in an individual with hypertrophic cardiomyopathy (HCM), has subsequently been reported in multiple individuals with left ventricular non-compaction (LVNC) and the arginine 243 position is highly conserved across species (ClinVar RCV000158756.1). Case C37 carries a variant of uncertain significance (VOUS) in MYH7 (p.Glu1455Lys). This variant has not been reported in any databases searched and the glutamic acid residue at 1455 is conserved across multiple species, including primates, chicken and zebrafish. Case C41 carries a MYH7 VOUS (p.Gln163Pro); this variant is absent from control databases and has been reported in an individual with LVNC [46]. Three EA cases were found to harbor both a sequence variant in a known EA gene and a candidate CNV (Table 5). EA case C26 carries a MYH7 variant in the splice region (c.3853+7C>T) and CNV #21. However, considering the low prevalence of EA and the minor allele frequency in ExAC (0.002), it is unlikely that MYH7 c.3853+7C>T alone causes EA. EA case C30 also carries a MYH7 variant (c.732+1G>A) in a canonical splice donor position that has not been reported in the normal population, was reported in an individual with LVNC and in a separate family cosegregated with LVNC in four individuals (ClinVar SCV000054831.2 & SCV000208693.1). It is more likely that the true cause of EA in case C30 is the MYH7 splice variant, and that CNV #29 Rare Genetic Variants in Ebstein Anomaly is a rare CNV not related to EA. Alternatively, both variants may be required for an EA phenotype. Finally, in EA case C46 we identified a heterozygous missense mutation (p.Ala42Pro) in NKX2-5. This variant has previously been reported in an individual with EA, however the variant was also present in the individual's unaffected father, which was attributed to incomplete penetrance [12]. It is possible NKX2-5 p.Ala42Pro does not contribute to EA in case C46, or that both NKX2-5 p.Ala42Pro and CNV #26 are required for the EA phenotype.
In addition to the five EA cases with a MYH7 variant, other cases carry sequence variants in genes associated with autosomal dominant cardiomyopathy. Two EA cases carry missense VOUS in myosin heavy chain cardiac muscle alpha (MYH6). EA case C14 carries MYH6 p. Arg1502Gln and EA case C31 carries MYH6 p.Gln1065His. MYH6 p.Arg1502Gln has been reported in multiple individuals with dilated cardiomyopathy (DCM), however it was also identified in healthy family members of one proband [47,48]. MYH6 p.Gln1065His has been reported in an individual with hypertrophic cardiomyopathy (HCM) and was absent from two unaffected family members [48]. EA case C2 carries a single amino acid deletion (p.Glu13del) in the titin-cap (TCAP) gene that has previously been reported in two DCM probands [49]. EA case C36 carries a VOUS (p.Arg367Leu) in junction plakoglobin (JUP) that has not been reported in the normal population databases searched. Dominant mutations in JUP have previously been reported to cause arrhythmogenic right ventricular cardiomyopathy [50]. In addition to being associated with cardiomyopathy in humans, both TCAP and JUP are in a critical region for EA identified via linkage study in a canine model [40]. Finally, three VOUS in titin (TTN) were found in two EA cases. Missense variants in TTN are challenging to interpret due to a high prevalence of missense variants in the general population and TTNs potential role as a DCM modifier gene [51]. However, p.Arg22231His and p.Asn9377Lys are both in the Czone of the A-band in the TTN protein (Uniprot accession: Q8WZ42), a region that has been shown to be enriched for missense variants in DCM cases [51].
Two EA cases have variants in genes that are unique to cushion formation in the atrioventricular (AV) canal (mitral and tricuspid valves), but not in the outflow tracts (OFT) (aortic and pulmonic valves) [52]. EA case C45 carries a missense T-box 5 (TBX5) variant (p.Ser498-Tyr). No anomalies other than EA were reported to the NYS Congential Malformations Registry for case C45.EA case C1 carries a missense SWI/SNF related matrix associated actin dependent regulator of chromatin subfamily a member 4 (SMARCA4) variant (p.Thr1458Ile). SMARCA4 regulates cardiac growth and differentiation via cardiomyocyte proliferation [53].
Two sequence variants were detected in genes that were overlapped by CNVs in other EA cases. One case (C11) carries a missense zinc finger protein 534 (ZNF534) variant (p. His385Asp) and another case (C18) carries a missense bone morphogenetic protein receptor type II (BMPR2) variant (p.Tyr743Cys).

CNV Analysis
One EA case (C26), carries a heterozygous deletion at 8p23.1 (CNV #21; Table 2). Deletions at 8p23.1 have previously been reported in EA [13] and other CHDs [54]. Haploinsuffiency of the cardiac transcription factor GATA4 is thought to be responsible for CHD seen in individuals with deletions at 8p23.1; however, the CNV identified in EA case C26 is 2.2 Mb from GATA4. Individuals with other CHDs have been reported with 8p23.1 deletions that do not overlap GATA4 [55]. As previously suggested, altered expression of GATA4 via positional effects may underlie this association [13,55]. A heterozygous MYH7 variant (c.3853+7C>T) was also detected in EA case C26; however, as discussed, it is unlikely that this variant is pathogenic.
Two EA cases (C12 and C47) had a duplication at 8q21.13 (CNV #2) that overlaps two large intergenic non-coding RNAs (lincRNAs). Though not well studied, lincRNAs have been shown, in mice, to coordinate cell-type specific gene expression [56], including activation of a gene regulatory network that includes cardiac transcription factors such as Nkx2-5, Gata4 and Tbx5 [57]. The CNV #2 duplications are also 3 Mb upstream of an important cardiac transcription factor HEY1 [58]. One additional 263 Kb heterozygous deletion at 8q21.12 (CNV #23 in EA case C3) is 650 Kb downstream from HEY1. In mice, Hey gene expression is required for formation of the boundary between the chamber myocardium and the AV canal myocardium [59]. Embryonic myocardial cells can differentiate into AV canal cardiomyoctes or chamber cardiomyocytes depending on the network of transcription factors expressed in each region. HEY1 represses transcription by cardiac transcription factors GATA4 and GATA6 [58] and represses AV canal gene expression in the chamber myocardium [60]. Although CNVs #2 and #23 are distant from HEY1, it is possible the CNVs affect expression. For example, significant expression level changes have been observed for genes located up to 6.5 Mb from CNVs that cause Williams-Beuren syndrome, for example [61]. No expression quantitative trait loci (eQTLs) for HEY1 have been reported in the Genotype-Tissue Expression (GTEx) browser (accessed March 2016) in the regions of the CNV #2 duplications or the CNV #23 deletion.
EA case C47 had a 133 Kb duplication at 4q26 (CNV #13) in addition to the 8q duplication. CNV #13 completely duplicates MYOZ2, a sarcomeric protein that is known to cause cardiomyopathy familial hypertrophic type 16 (MIM #613838). In mice, Myoz2 overexpression protects against cardiac hypertrophy [62]; however, it is unknown whether overexpression is protective in humans or whether the duplication would cause overexpression.
CNV #33 fully overlaps C16orf72, a gene that is highly conserved across species and is not fully duplicated by any CNV catalogued in DGV [39], which is comprised primarily of 'healthy' individuals. DECIPHER [63] and ISCA [64] contain CNVs that fully duplicate C16orf72; relevant phenotypes from overlapping DECIPHER cases 251553 and 259192 include "Defect in the atrial septum" and "Abnormality of the heart" in ISCA case nssv578696 (S2 Table). Since CNV #33 is smaller than the CNVs in DECIPHER (254 Kb vs. >11 Mb), our data narrow the critical region for genes effecting cardiac development in this region.
CNV #37 is a 491 Kb duplication at Xp11.3 that partially overlaps KDM6A, a lysine specific demethylase. Haploinsufficiency of KDM6A is associated with Kabuki syndrome (MIM #300867). EA has previously been reported in Kabuki syndrome cases [59]. To our knowledge, there have been no reports of KDM6A duplications in Kabuki syndrome. However, depending on the breakpoints, it is possible that the partial duplication causes haploinsufficiency of KDM6A by altering the coding frame. A large 7.3 Mb heterozygous deletion (CNV #8) at 2q23.1-24.1 was detected in a single EA case (C22). In addition to EA, case C22 also had syndactyly of toes and bicuspid aortic valve. There has been at least one previous report of an individual with CHD and a 2q23.1 microdeletion [65]. More than 50 transcripts are overlapped by the 2q23.1-24 deletion in EA case C22. The highest scoring PhenoDigm gene in the region, Kcnj3, causes abnormal myocardial function in mice when knocked out [66]. TNFAIP6 and GALNT13, were also identified as potential EA genes by PhenoDigm. TNFAIP6, is a member of the hyaluronan-binding protein family. TNFAIP6 is expressed in both embryonic and adult cardiac fibroblasts (myocardium) [67], and hyaluronic acid is a major component of the extra cellular matrix in endocardial cushions, the precursors to AV valves [68]. GALNT13 is a glycosyltransferase enzyme responsible for the Olinked glycosylation of mucins. SNPs in GALNT13 have been associated with increased tricuspid regurgitation jet velocity [69].
A novel 662 Kb duplication at 2q33.1-33.2 (CNV #9) was detected in a single EA case (C20). CNV #9 fully duplicates multiple genes, including BMPR2. In heart development, BMPR2 acts as a receptor for BMP2, BMP4 and BMP7, and has different spatial roles [52]. Deletion of Bmpr2 specifically in the endocardium of mice has been shown to result in abnormal tricuspid and mitral valve formation [73]. Furthermore, the gene set 'BMPR2-Kinase Perturbations from GEO' was significantly enriched for genes overlapped by candidate CNVs (P = 0.01). This finding indicates that a significant number of the genes overlapped by candidate EA CNVs show altered expression in GEO in experiments that perturb BMPR2 function. As shown in S5 Table, 14 candidate CNVs contain genes whose expression is at least partially controlled by BMPR2. Further evidence for the involvement of BMP signaling in the etiology of EA comes from a study showing deletion of Bmpr1a in murine cardiac myocytes leads to AV valve defects reminiscent of EA [74]. BMPR1A is a type I BMP receptor that forms a complex consisting of two type II and two type I receptors which together, signal downstream to activate SMAD transcriptional regulators. CNV #11 also implicates BMP/SMAD signaling in EA. CNV #11 is a small 84.7 Kb heterozygous deletion at 3p21.1 between the DCP1A and CAC-NA1D genes. DCP1A, is a SMAD4 interacting protein; SMAD4 is a downstream target of the BMP signaling pathway involved in cardiac development [75]. In mice, myocardial deletion of Smad4 has been shown to result in impaired trabeculation and thinning of ventricular myocardium [76].

Gene Set Enrichment Analysis
We utilized gene set enrichment analysis to discover shared pathways among all genes overlapped by the 36 candidate CNVs in Table 2. In total, 48 gene sets were significantly enriched for genes overlapped by EA CNVs (corrected P < 0.05; S5 Table). Of the 48 enriched gene sets, 10 were enriched for genes overlapped by deletions, nine for genes overlapped by duplications and 29 for genes from a combination of deletions and duplications.
One main connected cluster is formed from 31 of the enriched gene sets (Fig 1), each with overlap with the genes in the 'BMPR2-kinase perturbations from GEO' gene set. In this main cluster, three subclusters of substantially overlapping gene sets are formed. The abundance of connections to the BMPR2 gene set provides additional support for the involvement of BMP signaling in EA. Furthermore, a 'cardiac morphology' subcluster formed by five mammalian phenotype gene sets (Fig 1) supports the involvement of the genes from candidate CNVs in the development of EA. The most statistically significant mammalian phenotype, 'abnormal cardiac muscle tissue morphology' , specifically supports genetic involvement in the myocardial pathology in EA. Indeed, altered Z-disks in the cardiomyocytes of individuals with EA led Egorova et al. to suggest the presence of Z-disk gene mutations [77].
A 'histone modification' subcluster (Fig 1) is formed by two H3K27me3 gene sets, two SUZ12 gene sets and the MTF2 gene set. The polycomb repressor complex 2 (PRC2) is a regulator of transcriptional silencing during embryonic development. PRC2 modifies chromatin structure by trimethylating lysine 27 on histone H3 (H3K27me3). SUZ12 is a component of the PRC2, while MTF2 recruits PRC2. In mice, overexpression of Suz12 leads to apoptosis of cardiomyocytes and ventricular non-compaction [78]. The enrichment of gene sets related to histone modification by genes in candidate CNVs suggests histone modification and thus chromatin structure, could play a role in the development of EA. An excess of de novo mutations in genes involved in histone modification has been seen in severe CHD cases [79]. Histone modification is also involved in myocardial development and BMP signaling. In the AV canal, Gata4 together with Bmp2/Smad signaling leads to H3K27 acetylation and thus AV canal-specific gene expression. However, in the chamber myocardium, H3K27 deacetylation is necessary to suppress the AV canal-specific gene expression [60].
A 'rRNA processing' subcluster (Fig 1), consists of two GO ribosomal RNA (rRNA) processing gene sets, two MYC gene sets, the CHD1 gene set, and the KLF4 gene set. MYC regulates the efficiency of rRNA processing [80] and CHD1 is required for polymerase I transcription termination [81], which is the polymerase that transcribes rRNA. It is possible these genes alter cardiac protein synthesis via rRNA processing. Although no definitive link between rRNA processing and heart development has been reported, increased MYC and rRNA have been shown in human hearts after mechanical unloading [82]. The enrichment of genes related to rRNA processing in candidate CNVs warrants further investigation.
The gene sets 'POU5F1-CHEA (HESC)' and 'SOX2-CHEA (HESC)' form a 'differentiation' cluster (Fig 1). Both POU5F1 and SOX2 have been shown to be involved in cardiac development via differentiation of stem cells to a cardiac cell lineage [83]. In embryonic stem cells, POU5F1 inhibition prevents differentiation into cardiomyocytes [84]. In addition to the CNV candidate genes identified in the POU5F1 and SOX2 gene sets, CNV #24 has a potential role in cardiomyocyte differentiation (see above).

Strengths and Limitations
To our knowledge, our study is the first genome-wide study of CNVs in isolated EA cases. Our study is population-based and derived from the large and diverse NYS population (>2 million births in the study period). The NYS CMR has been shown to identify a large percentage of diagnosed cases with reportable birth defects [27]. There are limitations as well. Cases are reported by law, but detailed medical information is not always available in the reports. Moreover, only live births are included and parental DNA was not available to determine if variants identified occurred de novo. In addition, our sequence panel did not provide 100% coverage of all targeted genes (S2 Fig); it is possible that regions missed on our panel contain sequence variants relevant to EA. Finally, our study cannot prove pathogenicity of observed CNVs/sequence variants; follow-up studies using animal models or in vitro studies will be necessary to confirm that the variants detected in cases contribute to EA.

Conclusion
The cause of the tricuspid valve abnormality in EA is uncertain. Both failure of the valve to delaminate from the myocardium and abnormalities in myocardial development leading to valve anomalies have been suggested [1,[85][86][87]. Further, it has been suggested that EA may be a cardiomyopathy with valvular involvement rather than a primary valvular disorder [88]. Numerous candidate CNVs identified in this study contain genes linked to myocardial abnormalities or development. Our results suggest that unlike some other birth defects [89], EA is not caused by large recurrent CNVs. However, we found rare, potentially pathogenic CNVs carried by more than one-half of NYS EA cases. Our results support previous findings that genetic factors related to EA are likely heterogeneous. Additionally, we have underscored the potential role of histone-modifying genes in CHD and uncovered rRNA processing as a potential novel pathway involved in the etiology of EA. At least 11 of 35 candidate CNVs identified contain genes related to myocardial abnormalities or development, which may contribute to abnormal tricuspid valve development. This suggests that myocardial development may play an important role in EA. Our results specifically highlight the importance of the BMP pathway in altered myocardial development. Additionally, multiple genes within candidate CNVs and gene set enrichment analysis suggest abnormal differentiation of cardiomyocytes should be investigated as a potential etiological factor. Finally, 11 of 47 EA cases (23%) carry sequence variants in genes associated with autosomal dominant cardiomyopathy; these results further support the role of the myocardium in the pathogenesis of EA.   Table. Genes targeted by a custom HaloPlex gene panel.

Supporting Information
(XLSX) son at the Congenital Malformations Registry, New York State Department of Health, for data management; Matthew Shudt and Zhen Zhang at the Wadsworth Center Applied Genomic Technologies Core for next-generation sequencing; and Nathan Pankratz, University of Minnesota, and Karl G. Hill, Social Development Research Group, University of Washington, for generously sharing population B-allele frequency and GC content files for PennCNV software. This study makes use of data generated by the DECIPHER Consortium. A full list of centers that contributed to the generation of the data is available from http://decipher.sanger.ac.uk and via email from decipher@sanger.ac.uk. Those who carried out the original analysis and collection of the data bear no responsibility for the further analysis or interpretation. Some data used for comparison in this article were obtained from the International Standards for Cytogenomic Arrays Consortium database (http://www.iscaconsortium.org), which generates this information using the National Center for Biotechnology Information's database of genomic structural variation (dbVar, http://www.ncbi.nlm.nih.gov/dbvar/), study nstd37. Specimens and associated phenotype data were provided by International Standards for Cytogenomic Arrays Consortium member laboratories.