The Congenital Heart Disease Genetic Network Study: Cohort description

The Pediatric Cardiac Genomics Consortium (PCGC) designed the Congenital Heart Disease Genetic Network Study to provide phenotype and genotype data for a large congenital heart defects (CHDs) cohort. This article describes the PCGC cohort, overall and by major types of CHDs (e.g., conotruncal defects) and subtypes of conotrucal heart defects (e.g., tetralogy of Fallot) and left ventricular outflow tract obstructions (e.g., hypoplastic left heart syndrome). Cases with CHDs were recruited through ten sites, 2010–2014. Information on cases (N = 9,727) and their parents was collected through interviews and medical record abstraction. Four case characteristics, eleven parental characteristics, and thirteen parent-reported neurodevelopment outcomes were summarized using counts and frequencies and compared across CHD types and subtypes. Eleven percent of cases had a genetic diagnosis. Among cases without a genetic diagnosis, the majority had conotruncal heart defects (40%) or left ventricular outflow tract obstruction (21%). Across CHD types, there were significant differences (p<0.05) in the distribution of all four case characteristics (e.g., sex), four parental characteristics (e.g., maternal pregestational diabetes), and five neurodevelopmental outcomes (e.g., learning disabilities). Several characteristics (e.g., sex) were also significantly different across CHD subtypes. The PCGC cohort is one of the largest CHD cohorts available for the study of genetic determinants of risk and outcomes. The majority of cases do not have a genetic diagnosis. This description of the PCGC cohort, including differences across CHD types and subtypes, provides a reference work for investigators who are interested in collaborating with or using publically available resources from the PCGC.


Introduction
Congenital heart defects (CHDs) occur in approximately 1% of births and are among the most common and serious birth defects [1,2]. While advances in treatment have reduced CHDrelated mortality, CHDs remain the leading cause of birth defect-related infant deaths [3]. Moreover, the growing numbers of CHD survivors are at risk for a range of disease-related morbidities [4,5] and have reduced life-expectancies compared to their unaffected contemporaries [4].
CHDs include a broad spectrum of malformations that differ with respect to morphology, physiology, and clinical outcome. Although CHD risk is thought to be influenced by both environmental and genetic factors, relatively few specific CHD risk factors have been identified and the extent to which the etiology of different CHDs differ or overlap is unknown. Large epidemiological studies, such as the National Birth Defect Prevention Study, have identified a few non-genetic risk factors for CHDs including maternal pre-gestational diabetes, obesity, and smoking [6][7][8][9][10]. To accelerate understanding of the genetic contribution to CHDs, the National Heart, Lung, and Blood Institute formed the Pediatric Cardiac Genomics Consortium (PCGC). The PCGC designed and implemented the Congenital Heart Disease GEnetic NEtwork Study (CHD GENES) to establish the resources required to undertake comprehensive studies of the genetics of CHDs.
The rationale for, design of, and early results from CHD GENES have been described [11][12][13][14]. In addition, genotype array, exome sequence, whole genome sequence, and RNA sequence data from CHD GENES participants have been and will continue to be posted to dbGAP (dbGAP Accession: phs000571.v3.p2, January 2016). In this article, we provide a description of the phenotypes, characteristics, and selected parent-reported neurodevelopmental outcomes of the PCGC cohort, as a resource for the broader CHD research community. participants (or their parent/guardian) provided written informed consent. The Institutional Review Board at the University of Texas Health Science Center at Houston approved the study protocol for the data analyzed and presented in this article.
Patients with any diagnosis of CHD (except as noted below), regardless of sex, age, and race/ethnicity were eligible to participate. Patients with a genetic diagnosis were eligible to participate, but preference for enrolling such patients may have varied across study sites. Patients with isolated patent foramen ovale, prematurity-related isolated patent ductus arteriosus, pulmonary stenosis related to a twin-twin transfusion, and cardiomyopathy without a CHD were not eligible. Cardiac diagnoses were confirmed by review of imaging (e.g., echocardiogram) and operative reports. Information on genetic testing, genetic physical exams, and extracardiac malformations was abstracted from medical records. In addition, information on cases and their parents was obtained during subject and family interviews. Cases that did not participate in the interviews were excluded from this report.
Data collected by interview included race/ethnicity, sex, birth weight, and maternal and paternal ages at the time of the cases' birth. Data were also collected on maternal characteristics, including pre-pregnancy height and weight (to calculate pre-pregnancy body mass index), pre-gestational diabetes, gestational diabetes, epilepsy or seizure during pregnancy, and education level. For cases who were 1 year of age at recruitment, interview data were also collected on maternal smoking and alcohol use during the first trimester, any folic acid supplementation six months before pregnancy, and parity. For cases who were >1 year at recruitment, the interview included questions related to neurodevelopmental outcomes (e.g., attention deficit hyperactivity disorder, autism spectrum).
CHD diagnoses assigned using the International Paediatric and Congenital Cardiac Codes (http://www.ipccc.net/) were manually reviewed by two of the authors (S.E and E.G.) and cases were assigned to one of seven types of CHDs: laterality disorder (LAT), conotruncal heart defect (CTD), atrioventricular septal defect (AVCD), left ventricular outflow tract obstruction (LVOT), right ventricular outflow tract obstruction (RVOT), atrial septal defect (ASD), and other. These groups are based on subsets of lesion that are thought to share genetic and mechanistic underpinnings and are defined in Table 1. Cases were categorized using a hierarchical approach. First, cases with a laterality disorder, regardless of other findings, were placed in LAT. Next, cases with abnormal conotruncal anatomy (including specific subtypes of isolated ventricular septal defects), regardless of associated left or right sided obstruction or atrioventricular canal anomalies, were placed in CTD. Then, cases with atrioventricular canal abnormalities with normally related great arteries were categorized as AVSD and cases with left or right sided obstructive lesions with normally related great arteries and normal atrioventricular canals were assigned to LVOT or RVOT, respectively. Finally, cases with an isolated secundum or sinus venosus type atrial septal defect were assigned to ASD. Cases with any other CHD diagnosis were assigned to the other group. Based on data from the interviews and medical records, cases were classified as either having 1) an identified genetic diagnosis (i.e. a syndrome or genetic alteration thought to explain the associated CHD), or 2) no genetic diagnosis. For simplicity, we refer to such cases as "syndromic" and "nonsyndromic", respectively. Cases classified as nonsyndromic by this scheme may have had additional non-cardiac anomalies or reported neurodevelopmental deficits.

Statistical analysis
For syndromic cases, we reported counts and frequencies for each specific diagnosis. Given the clinical heterogeneity within this group, we excluded syndromic cases from subsequent analyses. For nonsyndromic cases, parental characteristics, case characteristics, and parentreported neurodevelopmental outcomes were described using counts and frequencies for discrete variables, and means and standard deviations or median and range for continuous variables. Due to differences in the education systems in the United States and United Kingdom, we excluded women who were educated in the United Kingdom in our description of maternal education. Further, we restricted our analyses of neurodevelopmental outcomes to cases who were !5 years of age at recruitment, since neurodevelopmental deficits may be under-diagnosed in younger children. In addition to assessing each of 13 parental-reported (yes/no) neurodevelopmental outcomes, we created a composite neurodevelopmental outcome variable, indicating a positive parental report for at least one of four conditions: developmental delay, learning disability, mental retardation, or autism spectrum disorder [13].
We used the chi-square test (or Fisher's exact test when >20% of cells had an expected cell count <5) to compare the distribution of categorical variables across types of CHDs. For continuous variables, we used ANOVA or the Kruskal-Wallis test to compare the mean or median, respectively, across types of CHDs. For ANOVA analyses, we used Levene's test to check the assumption of homogeneity of variance. If Levene's test was significant (p<0.05), we used Welch's ANOVA. Analyses of all variables, except neurodevelopmental outcomes, were repeated in the subset of cases who were 1 year of age at recruitment for the following reasons: 1) inaccurate recall of characteristics or events before or during pregnancy is of greater concern for cases ascertained at older ages than at younger ages; and 2) the distribution of characteristics across types of CHDs may be influenced by survival. Because of the heterogeneity within type of CHDs, analyses were also repeated to compare specific subtypes in the two largest types of CHDs-CTD and LVOT cases. These analyses were restricted to include subtypes that included at least 200 cases. For LVOT, cases with aortic stenosis were combined with cases with bicuspid aortic valve to create a subtype called 'aortic valve disease.' Because differences in the distribution of neurodevelopmental outcomes across types of CHDs may be influenced by factors other than the CHD diagnosis, we used logistic regression to control for potential confounders determined a priori from the literature [15]: maternal education, case race/ethnicity, sex, birth weight (low [<2,500g], normal [2,500-4,000g], high [>4,000g]), and extracardiac malformations (yes/no). Further, as neurodevelopmental deficits may be under-diagnosed in younger cases, we also adjusted for case age at the time of recruitment. Adjusted analyses were not conducted for the CTD and LVOT subtypes because of the relatively small numbers of cases with specific outcomes (e.g., double outlet right ventricle with autism spectrum, N = 4).
All analyses were conducted using SAS version 9.4 (SAS Institute Inc., Cary, NC). P-values <0.05 were considered statistically significant.
The nonsyndromic cases are described in Table 3 and the distributions of CHD subtypes (e.g., tetralogy of Fallot, truncus arteriosus) are provided in the S2 Table. The largest subsets of CHDs were CTD (40%) and LVOT (21%). The majority of cases were non-Hispanic White (59%) and male (55%). In addition, cases were predominantly born in the United States (86%), had normal birth weight (77%), did not have extracardiac malformations (76%), and were >1 year of age at recruitment (69%).
The description of the nonsyndromic cases, by type of CHDs, is provided in Table 4. The distributions of three maternal characteristics, across the six types of CHDs, were significantly different: body mass index (p = 0.002), pre-gestational diabetes (p<0.001), and education (p<0.001). For example, the proportion of cases with an obese mother ranged from 10% (ASD) to 19% (AVCD); the proportion with maternal pre-gestational diabetes ranged from 1% (ASD) to 5% (LAT); and the proportion of cases with a mother with less than a high school education ranged from 4% (AVCD) to 14% (ASD). A significant difference across types of CHDs was also observed for paternal age (p = 0.02) ( Table 4). When analyses were restricted to cases 1 year at recruitment, similar results were obtained for maternal education and paternal age. However, in this subset, differences were not statistically significant across type of CHD for maternal body mass index or pre-gestational diabetes (S3 Table). Although, overall, CHD cases were significantly (p<0.001) more likely to be male (55%) than female, males were predominant in only three of the types of CHDs (LAT, CTD and LVOT) ( Table 4). Significant differences across types of CHDs were also observed for case race/ethnicity, birth weight, and extracardiac malformations (p<0.001). For example, the proportion of cases that were non-Hispanic white ranged from 53% (ASD) to 67% (LVOT); the proportion of cases with a low birth weight ranged from 11% (LVOT) to 19% (CTD, ASD); and the proportion of cases that had extracardiac malformations ranged from 19% (RVOT) to 51% (LAT). Similar results were obtained when analyses were restricted cases 1 year at recruitment (S3 Table).

Neurodevelopmental outcomes
In nonsyndromic cases, the description of neurodevelopmental outcomes by type of CHDs and the p-values from the unadjusted analyses are provided in Table 5. Differences across types of CHDs were observed for attention deficit hyperactivity disorder (p = 0.03), depression (p = 0.01), developmental delay (p = 0.003), learning disability (p<0.001), repeated grade (p<0.001), and the composite neurodevelopmental outcome variable (p<0.001). The frequencies of these outcomes were highest for cases with RVOT (attention deficit hyperactivity disorder, 10%; depression, 10%) or AVCD (developmental delay, 17%; learning disability, 21%; repeated grade, 21%; composite measure, 28%) and lowest for cases with ASD (5%, 6%, 8%, 11%, 13%, and 10%, respectively). Results were similar in the adjusted analyses; however, differences across types of CHDs were no longer significant for attention deficit hyperactivity disorder and depression, and the adjusted model did not converge for autism spectrum and other neurodevelopmental outcomes.

CTD and LVOT subtypes
Analyses were repeated to assess differences across subtypes within CTDs and LVOTs (S4 and S5 Tables). Given the relatively small numbers of subtypes of CHDs in these two groups, only  unadjusted analyses were conducted. Significant differences across the four subtypes of CTDs were observed for maternal body mass index (p = 0.03), pre-gestational diabetes (p = 0.04), and parity (p = 0.01). Significant difference were also observed for infant sex (p<0.001), race/ ethnicity (p<0.001), birth weight (p<0.001), extracardiac malformations (p<0.001), and parent-reported anxiety (p = 0.03). Across the three subtypes of LVOTs, significant differences were observed for maternal education (p = 0.005), infant sex (p<0.001), race/ethnicity (p<0.001), birth weight (p = 0.008), extracardiac malformations (p = 0.04), and several neurodevelopmental outcomes. In general, adverse neurodevelopmental outcomes appeared to be reported more frequently by parents of hypoplastic left heart syndrome (HLHS) cases than by parents of aortic valve disease and coarctation of the aorta cases.

Discussion
Between 2010 and 2014, the PCGC recruited over 9,000 families with a child affected by a CHD. This cohort is one of only a few large contemporary CHD cohorts that can be used to study the genetic basis of the causes and consequences of these common and serious birth defects. The PCGC has established data sharing plans (https://benchtobassinet.com/ ForResearchers/B2BDataSharingPlan.aspx), which include data access through dbGap (dbGAP Accession: phs000571.v3.p2) and has established a process for proposing ancillary studies that make use of biospecimens. Hence, the PCGC cohort provides a valuable resource for the research community. This paper, in conjunction with an earlier report describing the rationale and design of the PCGC [11], provides investigators with details that should help to inform their study design (e.g., phenotype selection), analytic plan (e.g., power, subgroup analyses), and interpretation of study results (e.g., study limitations).
As enrollment for the PCGC cohort was through tertiary/quaternary medical centers, it was skewed toward cases with more severe forms of CHD. However, the cohort includes both cases with complex and cases with simple lesions, so it is broadly representative of the spectrum of clinically significant CHDs. For example, the low frequency of males among cases with ASDs and AVCDs is consistent with previous findings [10,[16][17][18]. As recruitment was center-specific, it is possible that differences in recruitment might have introduced some selection bias. For instances, the proportion of cases in the PCGC cohort with a genetic diagnosis (11%) is low compared to population-based estimates (~20%) [19]. This likely reflects the PCGC recruitment priorities (e.g., nonsyndromic over syndromic) and it is possible that some centers may have recruited a lower proportion of syndromic cases than other centers. Because cases of all ages were eligible and information was collected via subject or family interviews, these data are subject to recall errors. Recall error may account for differences in estimates obtained from the PCGC and from other studies. For example, this issue might explain why the proportion of mothers of PCGC cases who reported that they took folic acid prior to becoming pregnant is relatively high (56%), compared to estimates based on women who were pregnant or of child-bearing age (<45%) [20][21][22]. Additionally, because neurodevelopmental outcomes in cases were reported by parents, the reported frequencies may not reflect the distribution of neurodevelopmental outcomes in the general CHD population [23].
The PCGC did not conduct a case-control study. Since there is no comparable control group, the cohort cannot be used to study non-genetic risk factors for CHDs and CHD outcomes. However, differences in the distribution of known CHD risk factors (e.g., race/ethnicity, maternal pre-gestational diabetes) across types and subtypes of CHDs provide potentially important insights into the data. For example, in the PCGC cohort, the proportion of cases of Hispanic ethnicity differs across types of CHDs. As similar differences have been observed in population-based epidemiologic studies [24,25], this may reflect true underlying differences in the risk factor profiles of the different CHDs. Nonetheless, these differences might also be artificial. For example, these differences may be a result of lesion-specific differences in survival by ethnicity [26] or differences in ascertainment by ethnicity and/or type of CHDs. Either way, investigators need to be aware of these differences, since they may influence the results for studies of genetic variants that differ in frequency across ethnic groups.
In summary, we provide a description of the distribution of key variables in the PCGC cohort and identified differences in the distribution of certain characteristics across types and subtypes of CHDs. This information will help inform future genomic studies on the etiology and neurodevelopmental outcomes across types and subtypes of CHDs in the PGCG cohort.