Genome-Wide DNA Methylation Analysis and Epigenetic Variations Associated with Congenital Aortic Valve Stenosis (AVS)

Congenital heart defect (CHD) is the most common cause of death from congenital anomaly. Among several candidate epigenetic mechanisms, DNA methylation may play an important role in the etiology of CHDs. We conducted a genome-wide DNA methylation analysis using an Illumina Infinium 450k human methylation assay in a cohort of 24 newborns who had aortic valve stenosis (AVS), with gestational-age matched controls. The study identified significantly-altered CpG methylation at 59 sites in 52 genes in AVS subjects as compared to controls (either hypermethylated or demethylated). Gene Ontology analysis identified biological processes and functions for these genes including positive regulation of receptor-mediated endocytosis. Consistent with prior clinical data, the molecular function categories as determined using DAVID identified low-density lipoprotein receptor binding, lipoprotein receptor binding and identical protein binding to be over-represented in the AVS group. A significant epigenetic change in the APOA5 and PCSK9 genes known to be involved in AVS was also observed. A large number CpG methylation sites individually demonstrated good to excellent diagnostic accuracy for the prediction of AVS status, thus raising possibility of molecular screening markers for this disorder. Using epigenetic analysis we were able to identify genes significantly involved in the pathogenesis of AVS.


Introduction
Congenital heart defect (CHD) is the most common type of birth defect, and affects close to 1% of all live births [1,2]. Over 50% of children born with CHD will have at least one invasive surgery in their lifetime, while many will have multiple surgeries. In the US, approximately 20-30% of all infants with CHD have valve defects [3,4].
The etiology of CHD is multifactorial and complex. Approximately 30% of children with a chromosomal abnormality have associated CHD [5]; however, a majority of CHD cases have no identifiable genetic abnormality. Families with autosomal dominant [6], recessive [7] and X-linked [8] CHD have been reported; however, in the majority of cases genetic mechanisms have not heretofore been identified. It is postulated that sporadic CHD, which accounts for around 80% of all cases, arises from an interaction of genetic and environmental factors [9]. Gestational diabetes, dietary deficiency, medication use, maternal age, cigarette smoking, obesity, febrile illnesses in pregnancy, alcohol intake and viral infections have been reported as maternal-environmental risk factors for the development of CHD [10]. The exact mechanisms by which these factors cause CHD remain unknown.
At least 18 distinct types of congenital heart defects are recognized; among these aortic valve stenosis (AVS) is one of the most common and most serious valve disease that has strong genetic basis [11,12]. Congenital AVS, defined as incomplete obstruction of the valve orifice, is an important category of structural heart defect, and occurs in 3-6% of such cases [13]. There is variability in both the site of obstruction and severity of the obstruction. Sites of obstruction are sub-classified as valvular, subvalvular and supravalvular [14]. About half of infants with severe AVS require surgery [15]. Mild aortic stenosis is difficult to detect prenatally; however, critical aortic stenosis can lead to left ventricular myocardial dysfunction with endocardial fibroelastosis, left atrial dilation and narrowing of the aortic root [16]. These changes can be a prelude to the development of hypoplastic left heart syndrome. The present standard for prenatal screening is mostly based the use of ultrasonography and there is no reliable biologic screening marker for any type of CHD [17].
Recently, some studies have identified an association between dietary folate supplementation and decreased CHD risk [18]. Deficiency in folate is known to result in hyperhomocysteinemia, another risk factor for CHD development [19]. Folate is a B-vitamin that supplies the methyl group for DNA methylation and for other methylation reactions in the cell. Such findings are consistent with a role for epigenetic DNA methylation in the development of CHD. In a recently conducted genome-wide DNA-methylation analysis, we reported evidence of significantly altered CpG methylation levels genome-wide in multiple categories of CHD compared to controls. The study included cases of hypoplastic left heart syndrome, Teratology of Fallot, ventricular septal defect, atrial septal defect and coarctation of the aorta [20].
The prenatal and newborn detection of CHD remains a significant challenge [21,22]. We undertook a study to examine genome-wide DNA methylation patterns in newborns with AVS to identify genomic regions containing disease-related genes and epigenetic changes that may contribute to CHD pathophysiology. An important objective of the study was to identify DNA methylation biomarkers, serum molecules that could potentially be used in the future for risk estimation and detection of AVS.

Materials and Methods
Genomic DNA was obtained from neonatal dried blood spots using commercial DNA extraction kits (Qiagen QIAamp 1 ) according to manufacturer's protocol. Many studies have reported genome-wide DNA methylation profiles from archived dried blood spots using the Infinium HumanMethylation450 BeadChip as a suitable template [23,24]. Blood spot specimens were collected previously for the mandated newborn screening and treatment program run by the Michigan Department of Community Health in the State of Michigan (MDCH). All specimens were collected between 24 and 79 hours after birth. Parents/legal guardians were aware at the time of blood collection that residual blood spots after clinical testing may be utilized for research pending review of such study requests by the MDCH. This study was approved by both the institutional review boards from William Beaumont Hospital and the MDCH. Limited demographic information was available for each subject including date of sample collection, maternal age and race, gestational age at delivery and newborn sex along with the type of CHD anomaly. Suspected or diagnosis-unknown AVS cases were excluded. Unaffected controls had no reported medical disorder and were matched for birth weight, gestational age at delivery, ethnicity, year of birth, and interval from specimen collection to testing. Our cohort included 24 AVS subjects and 24 controls. All specimens were de-identified by removal of further protected health information and researchers were masked to subject identity. Details of the case control cohort are available in S1 Table. Genome-wide methylation analysis using the HumanMethylation450 Genome-wide methylation analysis was performed for 48 individuals (24 AVS subjects and 24 controls) using the HumanMethylation450, Illumina's newest Infinium1 HD BeadChip assay for methylation (Illumina, Inc., California, USA), which contains 485,577 methylation sites and requires only 500 ng of genomic DNA. These sites are equally distributed in the genome and represent 96% of RefSeq genes, 95% of CpG islands and an average of 17 CpG sites per covered gene region including the promoter, 5'UTR, coding, and 3'UTR regions. DNA methylation profiling using Illumina Infinium technology with peripheral blood lymphocytes has been used to identify CpG sites associated with disease states [25,26].
The DNA samples were bisulfite converted using the EZ DNA Methylation-Direct Kit (Zymo Research, Orange, CA) according to the manufacturer´s protocol. The fluorescently stained BeadChips were imaged by the Illumina iScan. Prior to detailed bioinformatic and statistical analysis, data preprocessing and quality control was performed including examination of the background signal intensity of both affected and negative controls, the methylated and unmethylated signals, and ratio of the methylated and unmethylated signal intensities. The processing is done fully according to manufacturer's protocol and 99% of the CpG loci are determined unequivocally.

Statistical and Bioinformatic analysis
Genome-wide, gene-specific DNA methylation was measured using the Genome Studio methylation analysis package (Illumina). Following the pre-processing described above, a DNA methylation ß-value was assigned to each CpG site. Differential methylation was assessed by comparing the ß-values per individual nucleotide at each CpG site between AVS subjects and controls. In order to avoid potential confounding factors, probes associated with sex chromosomes and/or containing SNPs in the probe sequence (listing dbSNP entries near or within the probe sequence, i.e., within 10 bp of the CpG site) were excluded from further analysis [27][28][29]. Probes targeting CpG loci associated with SNPs near or within the probe sequence may influence corresponding methylated probes [30]. The remaining CpG sites were taken forward for analysis.
The most discriminating 59 CpG sites were selected based on the pre-set cutoff criteria of >1.5-fold increase and/or >1.5-fold decrease with p < 0.05. These sites include the highest 46 hypermethylated and top 13 demethylated CpG sites for further analysis. The identified genes are listed in Table 1. The 59 CpG sites, corresponding to 52 genes, were differentially methylated either in the coding and/or promoter regions and were subsequently used to generate a heatmap using the ComplexHeatmap (v1.6.0) R package (v3.2.2). We have used ward distance for the hierarchical clustering of samples [31] (Fig 1). Subsequently these 59 CpG sites were used to calculate receiver operating characteristic (ROC) curves and the area under the ROC curves (ROC AUC).

Validation of methylation status by bisulfite sequencing
To confirm the results obtained from the Illumina HumanMethylation450 arrays, we performed pyrosequencing analysis of bisulphite-converted DNA with appropriate oligos using the PyroMark Q24 System and advanced CpG Reagents (Qiagen 1 ). The p-value for methylation differences between case and normal groups at each locus was calculated as previously described [32]. Filtering criteria for p-values were set at <0.05 and also <0.01 in order to identify the most differentiating cytosines. P-values were calculated with and without Benjamini and Hochberg False Discovery Rate (FDR) correction for multiple testing [33]. The Benjamini and Hochberg correction tolerates more false positive genes than the Bonferroni correction.
Further analysis of the differentially methylated genes was conducted for potential biological significance. ROC curves and ROC AUC were calculated to determine the diagnostic accuracy of specific cytosine loci to differentiate AVS from control groups. Data were normalized using the Controls Normalization Method.

Gene ontology analysis and functional enrichment
The genes found to be differentially methylated (at FDR p-value < 0.01) were uploaded to the web-based functional annotation tool DAVID V67 (DAVID/EASE, WebGestalt) for Gene Ontology analysis [34,35] including gene ID conversion, bio-pathways analysis, and the molecular functions of methylated and unmethylated regions. Literature data mining for cooccurrence of gene names and keywords of interest was performed using Chilibot. Only genes for which Entrez identifiers were available were further analyzed. Pathway analysis was carried out using Ingenuity pathway analysis (Ingenuity Systems). Over-represented canonical pathways, biological processes and molecular processes were identified.

Web Resources
The URLs for data presented herein are as follows:    29.87 (4.56) years in controls (p-value 1.00). Finally, maternal race and newborn gender were matched for analysis. Of the 52 genes identified during genome-wide methylation analysis, hierarchical clustering analysis demonstrated~10 as novel principal candidate genes that are commonly methylated and whose methylation was associated with altered gene expression in AVS individuals. Table 1 includes the most significantly differentially methylated CpG sites based on FDR-corrected p-values. The methylation status is represented as percentage methylation for a given probe in the sample. A positive '% m Change' value indicates an average increase in methylation status in AVS subjects compared to control samples. Similarly, a negative '% m Change' value indicates a decrease in methylation status in AVS subjects compared to controls. The p-value indicates the significance of the differential methylation levels. The University of California Santa Cruz (UCSC) gene name and genomic location of the C in the CG dinucleotide and the chromosome on which it is located as provided by Illumina are also shown in Table 1. The results obtained from the DAVID Pathway and Gene Ontology overrepresentation analysis for canonical pathways and for biological processes are presented in Table 2. Gene Set Enrichment analysis using multiple computational tools showed no significant functional enrichment due to the relatively small size of the gene list. Therefore Gene Ontology information for all genes given in the list was obtained and classified. DAVID pathway analysis software was used to identify molecular pathways associated with genes having differentially methylated CpG sites between AVS subjects and controls. Analysis was done on genes with at least one differentially methylated CpG site based on the uncorrected p-values.
In combination with the FDR p-value indicating methylation status, the area under the ROC curves can be used to distinguish AVS subjects from controls. All data cleaning and analysis was performed using R (version 3.2.3) and RStudio (version 0.99.489). The CpG sites corresponding to the 52 differentially methylated genes have a ROC AUC 0.50 including six CpG sites with ROC AUC 0.75: cg16464924 (AUC 075; 95% CI, 0.62 to 0.89), cg05890887 (AUC 0.80; 95% CI, 0.68 to 0.93), cg26196700 (AUC 0.75; 95% CI, 061 to 081), cg22469274 (AUC 0.76; 95% CI, 0.63 to 0.90), cg00994804 (AUC 0.76; 0.62 to 0.89), cg04836786 (AUC 0.76; 0.63 to 0.90). At each locus, the FDR p-value for methylation difference between AVS subjects and controls was highly significantly different. ROC curve analysis (Fig 2) narrowed down the number of markers commonly methylated in AVS for the development and further consideration of their role and validation and also possible use as biomarkers.
Biological processes and molecular function determination for these genes are shown in Table 2. Genes were further grouped according to their Gene Ontology-characterized function. Two genes were identified whose biological processes (APOA5 PCSK9) have known molecular function (AIMP1, PCSK9, TXNRD2, CLDN11, RUNX1, MCM6, APOA5 and PCSK9).

Discussion
We have presented genome-wide differential methylation analysis of AVS subjects. To our knowledge, this is the first study that has performed an array-based genome-wide high resolution DNA methylation in a cohort of AVS newborns. We found that epigenetic alteration of CpG sites exhibits a relationship with AVS phenotype, not only of those located within the CpG islands in the promoter region of genes, but also of those distributed throughout the gene including 5' UTR, coding, and 3'UTR regions. Recent work by our group [20] found a strong association between CpG methylation changes and the presence of multiple different types of CHD. Another recently published study reported DNA methylation changes in patients with Tetralogy of Fallot (TOF) using MassARRAY-based quantitative methylation analysis [36]. That study was, however, limited to targeted promoter regions. Taken together, these studies provide evidence that DNA methylation may be critical in the genetic and environmental interactions underlying cardiac morphogenesis. We have identified 59 CpG sites based on either at least 1.5 fold demethylation or 1.5 fold hypermethylation in disease samples. We hypothesize from the present study that altered expression of one or a combination of the 52 genes corresponding to the 59 CpG sites identified by differential methylation analysis is likely to be responsible for AVS.
There is a well-established and known association between defects in receptor-mediated endocytosis and many diseases such as hypercholesterolemia, a condition characterized by very high levels of cholesterol in the blood. Higher levels of total cholesterol increase the risk of cardiovascular disease [37,38] including aortic valve stenosis [39] and a high risk for heart disease even in childhood. The report by Chui et al [40] also showed significantly higher serum total cholesterol concentrations in patients with aortic stenosis than in controls. However, molecular mechanisms responsible for the association of hypercholesterolemia with CHD in offspring remain unknown. In the present study, the identification of multiple genes modified in AVS indicates that the phenotype is a complex trait.
DUSP27 (Dual Specificity Phosphatase 27 (Putative)) is a protein coding gene that belongs to the dual-specificity phosphatase (DUSPs) family of proteins. These proteins play important roles in a number of cell types and events, but are preferentially expressed in the heart and skeletal muscles. Li [41] conducted a Dusp27 knockdown experiment using siRNA to study its effect on myogenic differentiation. These experiments identified this gene's important role during muscle and heart development. Together, these findings indicate that Dusp27 may have a more important function in these two tissues. Arrington et al. [42] conducted exome sequencing of DNA samples from multiple affected family members with diverse CHDs and identified gene variations in DUSP27 among the 18 variations they noted. APOA5 (OMIM 606368) or apolipoprotein A5 is associated with plasma triglyceride levels, a major risk factor for coronary heart disease. Earlier association studies were conducted to examine the links between high-density lipoprotein genetics of various genes, including the APOA5 gene, and aortic valve stenosis risk [43]. However due to limited sample size, their study did not yield strong associations. A significant correlation between the APOA5 gene polymorphism and the levels of plasma high-density lipoprotein-cholosteal and increased risk for cardiovascular disease was previously identified [44,45]. Receiver operating characteristic (ROC) curve analysis of methylation profiles for four specific markers associated with aortic valve stenosis. AUC: Area Under Curve; 95% CI: 95% Confidence Interval. Lower and upper confidence interval was given in parentheses. We have identified six CpG sites (cg16464924, cg05890887, cg26196700, cg22469274 cg00994804 cg04836786) among 52 differentially methylated genes that have ROC AUC 0.75. At each locus, the FDR p-value for methylation difference between AVS subjects and controls was highly significantly different. Due to figure resolution, we have included only for four markers. Another important gene identified by Gene Ontology analysis is proprotein convertase, subtilisin/Kexin-Type 9 (PCSK9, OMIM 607786). PCSK9 is a serine protease that reduces both hepatic and extrahepatic low-density lipoprotein (LDL) receptor levels and increases plasma LDL cholesterol. Mutation in this gene leads to an increase in the LDL cholesterol level. Cohen et al [46] studied a large population and identified PCSK9 gene variations associated with differing plasma levels of LDL cholesterol and the risk of coronary heart disease, including myocardial infarction and mortality from coronary disease.
Two other genes displaying altered methylation are Runt-related transcription factor 1 (RUNX1; OMIM 151385) on chromosome 21q22.12, and Thioredoxin reductase 2 (TXNRD2; OMIM 606448) on chromosome 22q11.21. Deletion of chromosome 21q21.1-22.12 including the RUNX1 gene has been associated with multiple congenital anomalies and congenital heart defects including aortic stenosis [47,48]. Thioredoxin reductase is directly involved in reducing proteins such as insulin. Experiments in mice demonstrated TrxR2 is required for proper heart development [49]. Chromosome micro-deletions (del22q11) involving TXNRD2 have also been associated with various congenital heart defects including aortic valve stenosis [50]. Helicase-Like transcription factor (HLTF; OMIM 603257), a gene that maps to chromosome 3, also plays an important role in the mouse heart development and function [51].
A number of other unrecognized genes identified in the genome-wide analysis are also potentially implicated in the pathogenesis of AVS, however the association between these genes and heart defects and putative mechanisms of action remain to be further explored.
In summary, we have identified alterations in DNA methylation in a number of genes involved in biological processes important to cardiac development in newborns with AVS, supporting the hypothesis that epigenetic modification may play a crucial role in the development of congenital heart defects such as AVS.
In the present study we have identified novel DNA-methylated genes and critical biological pathways that correlate with human aortic valve stenosis. We have demonstrated profound methylation differences in multiple CpG sites in these genes in AVS subjects. The methylation levels of individual CpG sites were used to calculate the area under the ROC curves as a measure of the accuracy of a putative diagnostic test with 59 CpG sites. Many of the identified CpG sites have a higher ROC AUC than 0.5; among them, the six CpGs with the highest AUC of more than 0.75 display strong diagnostic potential. Our overall results raise the possibility of using a large number of different marker combinations for effective detection of AVS. The results suggest that DNA methylation analysis is a molecular technique that might have the potential to be developed into a test for screening and diagnosis of AVS.
Finally the present study also determined that among 52 genes identified, many have not previously been reported in association with AVS. These results also suggest that the selected genes may be involved in the development of AVS by DNA methylation. The functional role of these genes or their potential as novel DNA methylation markers remains to be examined. An important limitation of the present study is that the findings are based on a relatively small number of subjects; however, these results can form the basis for specific hypotheses regarding AVS pathophysiology. Future larger studies based on this preliminary data may contain further clues for new approaches to non-invasive prenatal diagnosis and even prevention of this serious congenital heart malformation.
Supporting Information S1 Table. Details of the AVS subject cohort and controls used in the present analysis (online only). (DOCX)