A Genome-Wide Investigation of Copy Number Variation in Patients with Sporadic Brain Arteriovenous Malformation

Background Brain arteriovenous malformations (BAVM) are clusters of abnormal blood vessels, with shunting of blood from the arterial to venous circulation and a high risk of rupture and intracranial hemorrhage. Most BAVMs are sporadic, but also occur in patients with Hereditary Hemorrhagic Telangiectasia, a Mendelian disorder caused by mutations in genes in the transforming growth factor beta (TGFβ) signaling pathway. Methods To investigate whether copy number variations (CNVs) contribute to risk of sporadic BAVM, we performed a genome-wide association study in 371 sporadic BAVM cases and 563 healthy controls, all Caucasian. Cases and controls were genotyped using the Affymetrix 6.0 array. CNVs were called using the PennCNV and Birdsuite algorithms and analyzed via segment-based and gene-based approaches. Common and rare CNVs were evaluated for association with BAVM. Results A CNV region on 1p36.13, containing the neuroblastoma breakpoint family, member 1 gene (NBPF1), was significantly enriched with duplications in BAVM cases compared to controls (P = 2.2×10−9); NBPF1 was also significantly associated with BAVM in gene-based analysis using both PennCNV and Birdsuite. We experimentally validated the 1p36.13 duplication; however, the association did not replicate in an independent cohort of 184 sporadic BAVM cases and 182 controls (OR = 0.81, P = 0.8). Rare CNV analysis did not identify genes significantly associated with BAVM. Conclusion We did not identify common CNVs associated with sporadic BAVM that replicated in an independent cohort. Replication in larger cohorts is required to elucidate the possible role of common or rare CNVs in BAVM pathogenesis.


Introduction
Brain arteriovenous malformations (BAVM) are a tangle of poorly formed blood vessels with abnormal connections between arteries and veins, with direct shunting of blood through a vascular nidus but without an intervening capillary bed. BAVMs are rare, occurring in less than 1% of the general population, but are a leading cause of hemorrhagic stroke in children and young adults. Although the majority of BAVMs arise sporadically, they also occur in patients with Hereditary Hemorrhagic Telangiectasia (HHT), a Mendelian disorder inherited in an autosomal dominant fashion and caused by mutations in one of three genes (ACVRL1, ENG and SMAD4) in the TGFb signaling pathway [1,2,3,4,5].
Copy number variations (CNVs) represent a significant source of genetic variation. CNVs, defined as deletions or duplications of a segment of DNA sequence $1 kb in size compared to a reference genome, affect roughly 12% of the human genome [15]. De novo CNVs can be a potential genetic mechanism in sporadic diseases [15]. Recent studies have demonstrated association of rare and common CNVs with several diseases, including schizophrenia [16,17,18,19], autism [20,21], and amyotrophic lateral sclerosis [22,23]. Mechanisms by which CNVs may influence gene function and thus disease susceptibility include gene dosage imbalances, altered messenger RNA (mRNA) expression levels or expression of truncated proteins with altered function [24].
Modern genome-wide arrays include probes for assessing CNVs, and CNVs can also be called using intensity signals from single nucleotide polymorphism (SNP) probes. However, accuracy of the current CNV calling algorithms varies considerably, yielding substantial false negative and false positive rates [25,26]. A recent study evaluating the performance of five commonly used CNV calling algorithms concluded that PennCNV and Birdsuite are superior to others when considering overall reproducibility of calls and Mendelian consistency [27].
We hypothesized that CNVs (rare or common) may contribute to sporadic BAVM risk. To obtain reliable CNV calls for association analysis, we used two algorithms to call CNVs and focused on CNVs identified by both algorithms significantly associated with BAVM. Here we present the results of the first genome-wide association study (GWAS) of CNVs in patients with sporadic BAVM.

Ethics statement
All participants gave written informed consent, and the study was approved by the Committee on Human Research (CHR) at the University of California, San Francisco; Kaiser Permanente Northern California Institutional Review Board for the Protection of Human Subjects; and the University Medical Center Utrecht Medical Ethics Review Committee, The Netherlands.

Sample recruitment
The initial cohort included sporadic BAVM patients (n = 371) recruited at the University of California, San Francisco (UCSF) or Kaiser Permanente Medical Care Plan of Northern California (KPNC) as part of a larger UCSF-KPNC Brain AVM registry. Of the 371 cases, 95 provided saliva and 276 cases provided blood specimens for DNA extraction. Controls included 216 healthy controls from a narcolepsy study [28] and 347 transplant donors from a kidney transplantation study [29]. All control participants provided blood specimens. Cases and controls were all of selfreported Caucasian race/ethnicity. The replication cohort comprised 184 Caucasian BAVM cases (37 cases from UCSF and 147 cases from the University Medical Center, Utrecht, The Netherlands) and 182 healthy Caucasian controls recruited for the Study Of Pharmacogenetics in Ethnically Diverse Populations (SOPHIE) [30]. AVM diagnosis, morphological, and clinical characteristics were recorded using standardized definitions [31,32].

Genotyping
The discovery cohort was genotyped using the Affymetrix Genome-Wide Human SNP array 6.0 (Affymetrix, Santa Clara, California), according to the manufacturer's protocols (http:// www.affymetrix.com); both cases and controls were genotyped in the same laboratory at UCSF. The Affymetrix 6.0 array contains 906,600 SNP probes and 946,000 CNV probes. Of the CNV probes, 800,000 are evenly spaced along the genome and the remaining probes target 3,700 known CNVs [33].

CNV calling and quality control filtering
Pre-CNV calling QC. We discarded samples with more than 5% missing genotypes, or which disagreed on computed and reported gender. For known or cryptic duplicates, the sample with the lower genotype call rate was dropped. Overall average genotyping call rate was 99%. A total of 338 cases and 510 controls passed pre-CNV calling QC filtering.
CNV calling. To identify deletions and duplications for the 22 autosomes, we used the version of the PennCNV algorithm optimized for CNV calls from the Affymetrix 6.0 array (http:// www.openbioinformatics.org/penncnv/penncnv_tutorial_affy_gw6. html) and adjusting for genomic waves [34,35]. At each marker, the B allele frequency (BAF), a measure of the normalized allelic intensity ratio, and log R ratio (LRR), a measure of the normalized total signal intensity are used together to infer copy number state. This algorithm combines these values, the distance between SNPs, and the population frequency of the B allele into a hidden Markov model (HMM) to identify autosomal deletions and duplications. For more precise modeling of the CNV events, PennCNV adopts a six-state definition [34]. To ensure reliability of the CNV calls produced by PennCNV, we also called CNVs using the Birdsuite algorithm. Birdsuite is a four-stage integrated analysis of SNPs and CNVs designed specifically for the Affymetrix 6.0 array. Birdsuite sequentially assigns copy number across regions of common copy number polymorphisms (CNPs) using Canary software, then calls SNP genotypes and identifies rare CNVs via HMM using Birdseye. Finally, copy number and SNP allele information are combined to provide an integrated genotype at every locus [36]. The Canary software determines copy number polymorphisms (CNPs) which are catalogued and present in more than 1% of 270 HapMap samples [33]. We used results from PennCNV as our primary findings and focused on top findings for which PennCNV and Birdsuite gave similar results as they are more likely to be genuine findings. Previous studies suggest that PennCNV is one of the optimal algorithms, and the use of more than one algorithm is highly recommended for CNV calling to reduce false positive calls [37,38,39].
Post-CNV calling QC. We retained only those CNVs that were called based on $20 markers in both PennCNV and Birdsuite (analysis with a $10 marker cutoff yielded similar results, data not shown). For PennCNV, to reduce the number of false positives, we removed outliers with respect to the LRR standard deviation (upper quartile+1.56IQR), BAF-median greater than 0.55 or less than 0.45, BAF-drift .0.005 and waviness factor of greater than 0.04 or less than 20.04. We also removed samples that were outliers with respect to the number of CNVs per individual (.92 CNVs, based on upper quartile+1.56IQR).
For CNVs called by Birdseye, we excluded CNVs with LOD ,10 and samples with high sample-specific measures of noise (variance .2). CNPs assigned a copy number state equal to 2 (normal) by Canary and those that mapped to the sex chromosomes were removed. Only CNPs with high confidence score (,0.1) were included for analysis. The list of common CNPs generated by Canary was then merged with the list of CNVs generated by Birdseye into one master file. Samples with an excess number of CNVs called (.633/sample) were also removed from downstream analysis. Since CNVs may be artificially split by the CNV calling algorithm, for both PennCNV and Birdsuite, adjacent calls of the same type were combined into a single CNV if the gap between the calls was ,20% of the total length of adjacent calls including the gap region. This resulted in a total of 26,355 CNVs across 270 cases and 457 controls for PennCNV and 27,657 CNVs across 289 cases and 443 controls for Birdsuite. CNVs overlapping telomeres, centromeres or segmental duplications were not removed but flagged as these regions are known to harbor spurious CNV calls. We also explored the results after excluding CNVs with .50% of their length overlapping segmental duplications.

Statistical analysis
CNV size was compared between cases and controls using a two-tailed Mann-Whitney U test because of the non-normality of the data. We tested association of both rare and common CNVs with increased risk of BAVM. For tests of association of common CNVs, we used both a segment-based scoring approach and a gene-based approach [40,41]. All coordinates are according to the human NCBI Build 36, hg18 reference sequence. All statistical analyses, except where otherwise noted, were performed using R version 2.10.1 software (www.rproject.org).

Segment-based scoring approach
CNV regions (CNVRs) were defined using a segment-based scoring approach that scans the genome for consecutive markers to identify loci with significantly more CNVs in cases compared to controls. Each marker is tested for enrichment of CNVs in cases versus controls after correcting for multiple testing using a onesided Fisher's exact test; this is done for duplications and deletions separately. We used principal component analysis (PCA) to model ancestry differences between cases and controls. PCA was performed by Eigenstrat v3.0 using SNP genotype calls from 72,456 unlinked markers distributed uniformly across the genome [42]. To confirm findings from PennCNV, CNVRs were defined in the same way using Birdsuite calls passing QC filtering. Only regions passing multiple testing correction using both PennCNV (adjusted for 91,083 tests for duplications and 80,663 tests for deletions) and Birdsuite (adjusted for 84,455 tests for duplications and 63,070 for deletions) were considered for downstream analysis. Because of the uncertainty in defining CNV boundaries when using intensity data from SNP arrays, CNVRs are defined by the union (total length encompassed by both algorithms). Association of CNVRs with BAVM was assessed by fitting a multivariate logistic regression model adjusting for age, sex and the top 3 principal components for population substructure. A CNVR was considered significantly associated with BAVM if P,10 25 in both PennCNV and Birdsuite. Finally, B allele frequency (BAF) and log R ratio (LRR) plots were manually examined for top BAVM-associated loci.

Gene-based approach
We performed a gene-based analysis to assess for significant enrichment of CNVs overlapping known genes in BAVM cases compared to controls. This approach identifies CNVs that could be individually rare, or may disrupt different parts of specific genes that could be involved in important pathways and contribute to the etiology of BAVM. In addition, it allows combined analysis of rare and common CNVs impacting the same gene, thus allowing evaluation of CNV calls that might be missed by the segmentbased approach.
To test for genes associated with BAVM, we examined CNVs overlapping genes plus 20 kb upstream and downstream of the gene boundaries. Significance was assessed using a one-sided Fisher's exact test correcting for the 1126 genes overlapping CNVs from both PennCNV and Birdsuite using the Bonferroni correction. Deletions and duplications were tested separately.

Rare CNV analysis
To test the hypothesis that cases have a greater burden of rare large CNVs compared to controls, we performed a burden analysis, defining burden as either: 1) the total number of CNVs carried by an individual, or 2) the total number of genes spanned by those CNVs.
For rare CNV analysis, we only considered CNVs called using PennCNV and .100 kb. We removed individuals who were outliers with respect to the total number of CNVs called per individual and to the total kb span of the CNVs. Finally, common CNVs (present in .1% of the total sample) were excluded, as well as CNVs that overlapped by at least 50% of their length with previously described common CNVs (PLINK software version 1.06) [43]. The final dataset consisted of 732 rare large CNVs from 437 individuals (158 cases and 279 controls).
The CNVs were further stratified by type (deletions or duplications) and by size (100-200 kb, 200-500 kb, 500-1000 kb and .1000 kb). Permutation was used to test if the total number of CNVs carried by an individual as well as the total number of genes spanned by those CNVs was significantly higher in cases compared to controls (PLINK) [43].
Similar analysis was performed for rare CNVs restricting to each of seven candidate biological pathways relevant to BAVM based on prior human studies and animal models (TGFb signaling including the 3 known HHT genes, Notch signaling, Vascular Endothelial Growth Factor (VEGF) signaling, Angiogenesis, Vascular Development, Inflammatory Response and Mitogen-Activated Protein Kinase (MAPK) signaling) [44,45,46,47].

Experimental validation and replication
To validate our top findings, we used several quantitative PCR (qPCR) assays. First, we used a commercially available probe (Applied Biosystems, Foster City, CA, Taqman Hs04206910_cn in NBPF1). qPCR was performed with 10 ng DNA in 10 mL reactions in triplicate with RNAseP internal reference control on an ABI7900HT thermocycler. For validation, we assayed 61 cases from the original cohort and evaluated concordance between copy numbers estimated by qPCR and by CNV calling algorithms (we did not have access to DNA from the original controls so we could not evaluate concordance in controls). For replication, we assayed 184 new BAVM cases (from Utrecht and UCSF) and 182 new controls. A negative control and at least 2 HapMap Caucasian reference samples were included on each plate (NA06991, NA06985 and NA12875). Threshold cycle (C T ) values for the target and the reference generated by qPCR were imported into CopyCaller software (Applied Biosystems, Foster City, CA). Copy number of the target sequence was determined by comparing cycle threshold (C T ) between locus probe and internal reference probe (DDC T ) using CopyCaller software (Applied Biosystems, Foster City, CA). We used the HapMap sample NA06991 as a calibrator sample.
Second, we designed three custom qPCR probes targeting the NBPF1 gene (probe 1, 59-CCGAAGCCCTAAATCTCAAC-39 and 59-ACGGCAAGGGACAATTGGCT-39; probe 2, 59-TTTGTGTCCGGAATGTGCCT-39 and 59-CCCTGCACT-TACCCTTGTCC-39; probe 3, 59-TTTCTACCTGGCCCTGGTCT-39 and 59-CCCCAGCTA-CATTTCATGGCT-39) and assayed them in 177 BAVM cases from Utrecht (123 cases overlapped the first replication cohort above). As above, a negative control and at least 2 Caucasian reference samples were included on each plate. Real time qPCR was performed with 20 ng DNA in 25 mL reactions in triplicate on an ABI7900HT thermocycler. CNVs were called using the DDC T method using the average DC T of the total sample as the reference.

Gene ontology and pathway analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7b was used for analyzing functional classification, gene ontology (GO) and pathway analysis (http:// david.abcc.ncifcrf.gov/) for BAVM-specific genes, defined as genes overlapping at least one CNV in cases and none in controls.

Results
A total of 338 BAVM cases and 510 controls passed array QC. Cases were significantly younger than controls (mean age = 38.7 y617.6 y and 50 y614 y respectively, P,0.001). Gender distribution was similar between cases and controls (percentage of females: 54.4% cases, 49.8% controls, P = 0.19). Cases had an average BAVM size of 3 cm61.6 (mean 6 standard deviation); 16.85% had exclusively deep venous drainage and 38% presented with hemorrhage.

CNV calling
Using PennCNV, we observed a total of 46,251 raw CNV calls across 338 BAVM cases and 510 controls, and 26,355 QC-filtered CNV calls across 270 cases and 457 controls. The average number of CNVs called per individual was significantly lower in cases compared to controls (34 vs. 37, P = 3.3610 29 ). The overall median CNV size was significantly larger in cases compared to controls (40 kb in cases vs. 35 kb in controls, P = 1.2610 214 ). For duplications, the average number of CNVs per individual did not differ between cases and controls. For deletions, the average number of CNVs called per individual was significantly smaller in cases compared to controls (Table S1); this was also the case for deletions using Birdsuite. However, for duplications using Birdsuite, we observed a higher average number of CNVs called per individual in cases compared to controls (Table S1). Since 26% of the cases provided saliva specimens for DNA extraction, we also compared the average number of CNVs called between cases with saliva specimens and cases with blood specimens. The average number of duplications called per individual was significantly higher in cases with blood specimens compared to cases with saliva specimens (16 vs. 12, P = 4.31610 26 ). For deletions, the average number of CNVs called per individual did not differ between blood and saliva DNA.

Segment-based analysis
Using PennCNV, we identified 11 CNVRs (9 duplications, Figure 1a, and 2 deletions, Figure 1b and Table 1) with significantly higher frequency in BAVM cases compared to controls (Fisher's Exact test after correcting for multiple testing, P#1.02610 25 , Table 1). All 11 CNVRs overlapped at least one copy number locus with a frequency of .1% in the HapMap population [33]. Among those 11 CNVRs, only one deletion on chromosome 6 ( Figure 1d) passed the correction for multiple testing using Birdsuite (P = 1.49610 29 ).
Among the 9 duplications passing the correction for multiple testing using PennCNV, parts of 3 CNVRs on chromosomes 16, 15 and 1 were also identified by Birdsuite (P,2610 25 ). CNVRs were then defined as the union (total length encompassed) of the CNVRs found by the two algorithms. We further assessed the association of the 4 CNVR with BAVM using a multivariate logistic model adjusting for age, sex and the top 3 principal components of population structure (Table 2). Only one CNVR on 1p36.13 was significantly associated with BAVM in the multivariate model using both PennCNV and Birdsuite.
After removing CNVs that overlapped segmental duplication regions, none of the CNVRs were significantly associated with BAVM in the multivariate model.
We also evaluated concordance between CNV calls made by PennCNV and Birdsuite for each CNVR. For the CNVR on 1p36. 13

Gene-based analysis
To complement the segment-based analysis with a CNV analysis that is less sensitive to CNV definition, we performed a gene-based analysis, including the gene and 20 kb upstream and downstream. We first examined CNVs in the three HHT genes (ACVRL1, ENG and SMAD4). None of the CNVs called by PennCNV overlapped any of the HHT genes in BAVM cases.
We then performed a genome-wide analysis; a total of 1126 genes overlapped CNVs called by both PennCNV and Birdsuite. Thirty gene transcripts showed significant enrichment of CNVs in cases compared to controls using PennCNV after correction for multiple testing (Table 3). Only one gene, OR4K1 (olfactory receptor, family 4, subfamily K, member 1) was significantly enriched for deletions in cases compared to controls (P = 1.13610 26 ); however, this was not supported by Birdsuite analysis. Among duplications, the NBPF1 gene on chromosome 1 was significantly associated with BAVM in both PennCNV and Birdsuite (Table 3 and Table S2); NBPF1 is located within the BAVM-associated CNVR on chromosome 1 from the segmentbased analysis. Figure S1 shows the duplication CNVs called by PennCNV at 1p36.13 in cases and controls [48]. 486 genes overlapped CNVs from both PennCNV and Birdsuite after excluding CNVs with .50% of their length overlapping segmental duplication regions; none of these genes were significantly associated with BAVM after correcting for multiple testing.

Rare CNVs
To test the hypothesis that cases have a higher burden of rare CNVs compared to controls, we examined rare CNVs called from both PennCNV and Birdsuite. We did not observe a significant excess of rare large CNVs or genes disrupted in BAVM cases compared to controls (data not shown).
PennCNV identified 542 genes with CNVs in BAVM cases but not in controls (CNV overlapping the gene 620 kb). Birdsuite analysis identified 247 of these 542 genes. However, none of these genes were significantly associated with BAVM after correction for multiple testing. Table S3 lists thirteen BAVM-specific genes for which at least two BAVM subjects and no controls carried CNVs in both PennCNV and Birdsuite analysis. Eleven genes showed BAVM-specific deletions, while ATG5 and PRDM1 carried BAVM-specific duplications.
We also investigated whether BAVM cases carry a greater burden of rare CNVs overlapping genes in each of 7 candidate biological pathways relevant to BAVM based on prior studies (TGFb signaling, Notch signaling, VEGF signaling, Angiogenesis, Vascular Development, Inflammatory Response and MAPK signaling). These pathways included a total of 572 genes. We did not observe statistically significant enrichment of rare CNVs in BAVM cases for any of the candidate pathways (data not shown).

Gene ontology and functional annotation
The Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7b was used for analyzing functional classification, gene ontology of biological processes (GO), and pathway (http://david.abcc.ncifcrf.gov/) for the 542 genes bearing BAVM-specific CNVs using PennCNV. Several pathways, including the chemokine signaling pathway (15 genes, P = 1.2610 23 , Table S4) were nominally over-represented. In GO analysis, significantly enriched GO terms included positive regulation of smooth muscle cell proliferation (Fold enrichment = 9.89 and corrected P = 0.02) with a total of 8 AVMspecific genes (PRKCA, TNF, NOTCH4, ITGA2, EGFR, AGER, FKBPL and AGPAT1). The molecular function identified as most significant in GO analysis was cadmium ion binding (fold enrichment = 29.93, Bonferroni corrected P = 5.76610 27 , Table  S5).

Experimental validation
To validate the finding that BAVM cases have a higher burden of CNVs mapping to the 1p36.13 region encompassing the NBPF1 gene compared to controls, we used several qPCR assays. First, we used a commercial qPCR assay targeting NBPF1 (Applied Biosystems Taqman Hs04206910_cn). We observed a concordance rate of 78% between CNV states determined by qPCR and by PennCNV in 51 BAVM cases, including 23 called duplication and 28 called wildtype by PennCNV. For Birdsuite, we observed a concordance rate of 52% between Birdsuite calls and qPCR (14 called duplications and 37 called wildtype by Birdsuite). We then proceeded to replicate the 1p36.13 association with BAVM in an independent cohort of 184 BAVM cases and 182 healthy controls utilizing the qPCR assay. In the replication cohort, duplication at this locus was observed in 13% BAVM cases and 15% of the controls (OR = 0.81, P = 0.8) not supporting the original association results. Furthermore, we did not find support for replication utilizing 3 additional qPCR probes in the NBPF1 gene region. Three of 177 BAVM cases were called duplications by all 3 probes, a much lower frequency than in the discovery cohort. The two replication cohorts contained 123 overlapping samples, and concordance between copy number calls was low (correlation r 2 = 0.4).

Discussion
This is the first genome-wide study to investigate whether CNVs might be associated with sporadic BAVM susceptibility. We identified several common CNV loci associated with sporadic BAVM in our Caucasian cohort of 371 sporadic BAVM cases and 563 controls. We focused on top findings that were in agreement between two CNV calling algorithms and between gene-based and segment-based analysis approaches. A BAVM-associated CNVR mapping to chr 1p36 was experimentally validated using quantitative real-time PCR, but did not replicate in an independent cohort of 184 BAVM cases and 182 controls.
The common BAVM-associated duplication observed at 1p36 encompasses NBPF1, the founding member of the NBPF gene family that consists of 22 genes and pseudogenes and likely arose by gene duplication. Very little is known about the function of NBPF proteins; some of them, including NBPF1, may be tumor suppressors [49]. Loss of heterozygosity (LOH) for the 1p36 locus encompassing the NBPF1 gene has been shown in neuroblastoma and some other tumors [50]. To date, this region has not been reported to be associated with any vascular diseases or phenotypes.
Our initial association screen also identified other BAVMassociated CNVR mapping to chr15q11, 6q16 and 16p11. However, the association with BAVM did not persist in multivariate models adjusting for age, sex and the top 3 principal components, utilizing CNV calls from both PennCNV and Birdsuite. Further, none of these CNVRs overlapped genes associated with BAVM using the gene-based approach in both algorithms. The chromosome 15q11.2 CNVR identified in our study did not overlap the linkage region on 15q11-q13 reported in non-HHT familial BAVM patients [7]. The small deletion on 6q16.3 showed poor concordance between CNV-calling algorithms and did not overlap any genes. Due to the highly repetitive nature of the chr6 and 15 CNVR loci, we were not able to design qPCR probes for validation. Although we did not observe a statistically significant association of rare CNVs with sporadic BAVM in our cohort, given our limited sample size, we cannot rule out the possibility that rare CNVs may contribute to BAVM susceptibility. All robust CNV associations to disease phenotypes that have been reported to date are with rare CNVs, with the exception of several autoimmune phenotypes [51,52]. We identified 13 genes bearing BAVMspecific CNVs in at least 2 BAVM subjects, which are candidates for replication studies in larger BAVM cohorts. Notably, we did not identify any BAVM-specific CNVs in the 3 known HHT genes, in any other genes in the TGFb signaling pathway, or in genes from 6 other biologically relevant pathways.
Functional classification, GO, and pathway analysis using genes exclusively deleted or duplicated in cases but not in controls identified several pathways and GO terms relevant to BAVM pathogenesis. The most significant pathway was the chemokine signaling pathway. Significantly enriched GO terms included positive regulation of smooth muscle cell proliferation and cadmium ion binding. Interestingly, studies of human vascular endothelial cells suggest cadmium may alter angiogenesis and induce apoptosis through VEGF signaling [53].
The size and the average number of CNVs called per individual differed significantly between cases and controls. In particular, when restricting to CNVs .10 kb in size, for deletions, the average number of CNVs per individual was significantly lower in cases than controls in both PennCNV and Birdsuite. This difference may be partially explained by differences in DNA extracted from blood and saliva. In our cohort, all control samples provided blood while some of the cases provided saliva specimens. It has been previously reported that CNV analyses differed between blood and saliva samples for the same individual, particularly for shorter CNV regions [54]. In contrast to a previous study [55], we identified a statistically significant increase in the number of CNVs detected in blood DNA specimens compared to saliva DNA specimens among cases. However, since case-control differences persisted even when restricting to blood DNA samples, the difference between saliva and blood DNA samples does not explain all of the observed effect. Furthermore, variation in experimental methods including time to genotype all study cases and controls (,3 years) and batch (i.e., plate) effects may contribute to the observed case-control differences in the number of CNVs. While this is a limitation of our study, it would act as a negative confounder, since the excess of CNV calls was observed in controls, while the study hypotheses tested for an excess of CNVs in BAVM cases. This study is limited by the small sample size, which was not powered to detect associations with small effect sizes. However, this is the largest cohort of sporadic BAVM patients for whom genome-wide genotype data have been analyzed. Further, we have only explored the role of CNVs in sporadic BAVM subjects of European ancestry; results may not generalize to other ethnicities. For the validation experiments, we evaluated the concordance of CNV states called between PennCNV, Birdsuite and qPCR among BAVM cases. Unfortunately, we were not able to evaluate the concordance in the controls used as we do not have access to their DNA. Since CNV calls from SNP arrays are based on relative signal intensity of a test sample compared to a reference, copy numbers in regions overlapping segmental duplications are highly variable and may not be reliably measured. In fact, copy number is unlikely to be 2 for a normal sample due to the repetitive nature of these regions. Our top finding is located in a region of segmental duplication and it is a known limitation that CNV calling algorithms may not reliably call CNVs overlapping segmental duplications, which comprise a large portion of the copy number variable regions in the human genome (,29% [56]). This may explain why the association of the chr1p36 duplication with BAVM did not replicate in an independent cohort.
In conclusion, we provide the first evaluation of the role of CNVs in sporadic BAVM. We identified several candidate common CNV loci associated with BAVM, although the top finding on chromosome 1 did not replicate in an independent cohort. We also identified a number of genes bearing BAVMspecific CNVs; however, larger sample sizes are needed to test the hypothesis that rare CNVs contribute to BAVM pathogenesis. Figure S1 CNVs called by PennCNV that mapped to 1p36.13. UCSC views of raw CNVs of type duplication called by PennCNV (BAVM cases in blue and controls in dark blue), mapping to the most significant BAVM-associated locus on 1p36.13 that encompasses the NBPF1 gene. Depicted on the plot with a red arrow is the location of the Taqman Hs04206910_cn probe interrogating the NBPF1 gene at Chr1:16802766 (NCBI build 36). (TIF)