A Common CYFIP1 Variant at the 15q11.2 Disease Locus Is Associated with Structural Variation at the Language-Related Left Supramarginal Gyrus

Copy number variants (CNVs) at the Breakpoint 1 to Breakpoint 2 region at 15q11.2 (BP1-2) are associated with language-related difficulties and increased risk for developmental disorders in which language is compromised. Towards underlying mechanisms, we investigated relationships between single nucleotide polymorphisms (SNPs) across the region and quantitative measures of human brain structure obtained by magnetic resonance imaging of healthy subjects. We report an association between rs4778298, a common variant at CYFIP1, and inter-individual variation in surface area across the left supramarginal gyrus (lh.SMG), a cortical structure implicated in speech and language in independent discovery (n = 100) and validation cohorts (n = 2621). In silico analyses determined that this same variant, and others nearby, is also associated with differences in levels of CYFIP1 mRNA in human brain. One of these nearby polymorphisms is predicted to disrupt a consensus binding site for FOXP2, a transcription factor implicated in speech and language. Consistent with a model where FOXP2 regulates CYFIP1 levels and in turn influences lh.SMG surface area, analysis of publically available expression data identified a relationship between expression of FOXP2 and CYFIP1 mRNA in human brain. We propose that altered CYFIP1 dosage, through aberrant patterning of the lh.SMG, may contribute to language-related difficulties associated with BP1-2 CNVs. More generally, this approach may be useful in clarifying the contribution of individual genes at CNV risk loci.


Introduction
Rare multi-gene copy number variants (CNVs) are well established to increase risk for neurodevelopmental disorders, but translational efforts have been hindered by a limited understanding of underlying mechanisms. The four gene region between breakpoints 1 and 2 (BP1-2) at 15q11.2 is interesting in this regard in that deletions are associated with increased risk for epilepsy and schizophrenia [1][2][3][4][5][6], and the reciprocal duplications may be relevant to autism [7][8][9]. Language appears to be compromised in a large number of CNV carriers, in that a metaanalysis of clinically ascertained BP1-2 CNV carriers determined that speech delay was present in 92% and 49% of deletion and duplication subjects, respectively (8). Similarly, separate work found evidence for a significant association between deletion status and self-reported reading difficulties (e.g. "Did you experience any difficulties in reading in elementary school?") [4]. Described below is work aimed at providing molecular insights into how genetic variation at the BP1-2 region may contribute to disease-related variation in human brain structure.
Recent work found a relationship between gene dosage at BP1-2 and structural variation at multiple disease associated brain regions [4]. Through analysis of magnetic resonance imaging (MRI) data from BP1-2 deletion, duplication, and CNV neutral subjects, investigators identified linear relationships between gene dosage and volume of the left insula (lh.insula), right anterior cingulate cortex (rh.ACC), corpus callosum (CC), left temporal white matter (lh. tempWM), and the left supramarginal gyrus (lh.SMG). The volumes of the lh.insula, rh.ACC, lh.SMG, and lh.tempWM were reduced in deletion carriers and increased in duplication carriers, whereas the opposite directionality was observed for the CC. Interestingly, people with schizophrenia are reported to show reduced volumes of the insula, CC, and lh.tempWM [10][11][12], as well as abnormal connectivity of the rh.ACC [13]. Structural and functional alterations of the lh.SMG have been implicated in speech and language [14,15]. Taken together, these data support a model whereby altered gene dosage at BP1-2 influences regional brain development and increases risk for disease. We hypothesized that common regulatory variants at the locus might be associated with similar effects in healthy individuals, and that their identification might clarify mechanisms of disease.
Although not yet studied with regard to disease, it is directly regulated by GSK3beta, implicated in schizophrenia [19][20][21]. More is known about the adjacent gene that encodes CYFIP1. Binding to RAC1 activates the Wave Regulatory Complex (WRC) and initiates cytoskeletal remodeling [22,23]. A separate interaction between CYFIP1 and the fragile X mental retardation protein (FMRP) results in the repression of eIF4E-mediated translation [24,25]. Knockdown or overexpression of CYFIP1 impacts neuronal morphology, brain development and function [26][27][28][29] and common CYFIP1 regulatory variants have been associated with schizophrenia and autism [2,28,30]. Separate studies looking at NIPA2, a third gene in the region, identified rare functional variants in epilepsy patients but not controls [31,32] and found a schizophrenia associated single nucleotide polymorphism (SNP) upstream of the gene [2]. Moreover, investigation of the relationship between regional gene expression and behavior in individuals with Prader-Willi Syndrome found strong correlations with NIPA2 [33]. NIPA1, immediately downstream of NIPA2, has been implicated in axonal growth [34]. Related, is that dominant negative mutations in NIPA1 cause hereditary spastic paraplegia [35].
In this study, examination of a discovery cohort identified SNPs associated with structural variation at brain regions known to be sensitive to gene dosage at the BP1-2 region. The top hit, upstream of CYFIP1, was likewise associated with variation in lh.SMG surface area in a large independent cohort. Consistent with a gene dosage effect, interrogation of publically available expression data revealed a relationship between genotype at this variant and CYFIP1 levels in human brain. Additional in silico analyses we performed suggest that this association with CYFIP1 expression may be mediated by allele-dependent regulation by FOXP2, a transcription factor (TF) implicated in speech and language [36][37][38]. Results provide independent support for previous work identifying a relationship between genetic variation at BP1-2 and human brain structure, and point to CYFIP1 as a likely contributor. The novel link to FOXP2 provides convergent evidence for involvement of the BP1-2 region in language. The approach used here could be useful in understanding risk mechanisms at other disease-related CNVs.

Materials and Methods
Discovery Cohort: Subjects, MRI, and Genotyping The Einstein Institutional Review Board approved this study and subject recruitment. All participants gave informed consent prior to participation. 100 unrelated adults without any selfreported physical disability, psychiatric diagnosis, or first degree relatives with a psychiatric diagnosis were recruited for MRI imaging (S1 Table). 81 of these individuals were Caucasian. Saliva was collected using Oragene OG-500 kits (DNA Genotek). T1-weighed anatomical images were acquired using a Philips 3T Achieva Quasar TX scanner with 32 channel SENSE head coils. An MPRAGE protocol with the following parameters was used: TR/TE = 8.187/ 3.724 ms, TFE prepulse inversion time = 900ms, flip angle = 8°, 220 axial slices (1.0 mm thick) with 256x256 acquisition matrix. 101 HapMap SNPs with a minor allele frequency (MAF) 15% mapped to the BP1-2 interval (chr15:22783636-23085559; Hg19). Tagger was used to define 57 tag-SNPs (r 2 0.8, CEU/TSI populations from Hapmap Version 3, Release 2; S1 Fig and S2 Table) [39]. DNA was extracted using QuickGene SP kits (Kurabo) and a QuickGene 610 instrument (Autogen). Integrity was verified via electrophoresis and concentration measured using a NanoDrop 2000 (Thermo). Primers for Sequenom genotyping were designed for 56/57 SNPs that could be multiplexed [40]. Genotype quality was evaluated by visual examination of clustering, testing for Hardy-Weinberg equilibrium, and retyping of random subjects. 52 SNPs remained after quality control.

Validation Cohort-Subjects, MRI, and Genotyping
Subjects were healthy adults of Dutch descent included in the Nijmegen Brain Imaging Genetics cohort [41,42]. None reported any neurological or psychiatric history. Research was approved by the CMO Region Arnhem-Nijmegen ethics committee. All participants gave written informed consent prior to participation. Imaging data was available or generated for 1276 subjects (all Caucasian) typed for each of rs12102024 and rs1051288, and 2621 individuals (2383 Caucasian) typed for rs4778298 (S3 Table). T1-weighted anatomical scans were acquired for previous work using a 1.5T or 3T Siemens scanner. rs12102024 and rs1051288 genotypes, obtained by Affymetrix GeneChip SNP 6.0 array, were likewise generated previously [43]. rs4778298 was genotyped for this study at the Max Planck Institute in Nijmegen using a custom KASP assay (LGC Genomics). The assay was validated by Sanger sequencing of randomly-selected samples.
tightly restricted to the lh.insula, rh.ACC, CC, lh.temp.WM, and lh.SMG, regions of interest (ROIs) shown previously to be sensitive to CNVs at BP1-2 [4]. Volume of the rh.caudalanteriorcingulate was taken to represent the rh.ACC, and the volumes of five individually segmented callosal subregions (anterior, mid anterior, central, mid posterior, and posterior) as the CC. lh. tempWM was defined as white matter within the: left bank of the superior temporal sulcus, left entorhinal cortex, left fusiform gyrus, left inferior-temporal cortex, left middle-temporal cortex, left parahippocampal cortex, left superior-temporal cortex, and left temporal pole. Because FreeSurfer does not output surface area or thickness of subcortical regions, the CC and lh. temp.WM were only included for analysis of volume.

Statistical Analyses
In the discovery cohort, associations of individual SNP-ROI pairs were examined using R [47] according to the following linear regression model: Volume of ROI = α + β1(variant) + β2(age) + β3(gender) + β4(ethnicity) + β5(handedness) + β6(total brain volume) + ε, in which α is the intercept and ε random error. For analysis of surface area and cortical thickness, total brain surface area and mean cortical thickness of the whole brain were used for β6, respectively. To assess the aggregate effect of individual SNPs across ROIs, the sum of chi-square statistics (X 2 0 ) was calculated and its significance (p permutation corrected < 0.05) was evaluated by permutation with 10,000 replicates (X 2 n ). By aggregating effects across ROIs for each SNP, we were able to perform only one test for each anatomical trait (volume, surface area, and thickness), improving power to detect any effects present. In the validation cohort, three SNP-ROI pairs were tested in PLINK [48] under a linear regression model as above, but scanner field strength was included and handedness omitted. Handedness was not included as a covariate in our analysis of data from the validation cohort. Left handed individuals made up less than 5% of subjects and previously published work in this cohort saw no relationship between handedness and any regional measure of surface area [49].

Gene Expression
Normalized expression data (Human Exon 1.0 ST, Affymetrix) for 1231 human brain samples from 134 individuals (16-102 at death; mean age 59) and corresponding genotypes (Omni 1M Immunochip, Illumina) were downloaded from BRAINEAC [50]. The relationship between expression (averaged across brain regions) and genotype was modeled using linear regression to identify expression quantitative trait loci (eQTL) variants. No correction for multiple comparisons was performed because only one probe (Affymetrix 3583676), specific for the 5' UTR of CYFIP1 short , was tested. For evaluating the relationship between levels of CYFIP1 short and candidate TFs, Pearson's correlation coefficients were calculated, and significance assessed using an F-test based on a lack-of-fit sum-of-squares method.

TF Binding
50 bps of genomic sequence on either side of each CYFIP1 short eQTL variant was extracted from the UCSC browser (Hg19). Sequences (1 per allele) were screened against human Position Weight Matrices (PWMs) in JASPAR, a database of TF binding profiles, using the Transcription Factor Binding Site Tools package [51,52]. Matches were defined as having a relative score 0.8. For differential binding, information content 1 at the variable base within the PWM was required.

Results
Common genetic variation at BP1-2 is associated with inter-individual differences in brain structure We performed targeted genetic association analyses in a discovery cohort (S1 Table). 52 tag-SNPs from the BP1-2 region were genotyped (S2 Table), and the relationships to volume, surface area, and thickness at predefined ROIs examined under an additive model. Hypothesizing that common regulatory variants within the interval would give rise to effects like those seen in rare CNV carriers, we restricted analyses to ROIs shown previously to be sensitive to BP1-2 dosage [4]. Significant associations were observed (Fig 1A and S4 Table). The strongest signal (rs4778298 at CYFIP1; p permutation corrected = 5.6x10 -3 ; Fig 1B) became more pronounced after limiting analysis to Caucasians (p permutation corrected = 1.0x10 -4 ; Fig 1B). Finally, even if an overly stringent Bonferonni correction is used to account for the 3 endpoints examined, p-values of 1.7x10 -2 and 3.3x10 -4 are observed. Counter to our initial hypothesis that common regulatory variants might uniformly impact all BP1-2 sensitive brain regions, effects were attributable to individual ROIs (Fig 2A and S5 Table). A post-hoc permutation analysis, accounting for all 52 tag-SNPs and ROIs, revealed a significant association between rs4778298 and lh.SMG surface area in Caucasians (p permutation corrected = 7.5x10 -3 ; see S6 Table). Linear effects at the CC and lh.SMG were observed for rs12102024 and rs4778298, respectively, consistent with an effect of gene dosage (Fig 2B).

Relationship between SNP genotype at CYFIP1 and left supramarginal gyrus surface area in an independent population
To assess the generalizability of our findings, we analyzed the top three hits in an independent cohort of healthy adults [43]. Imaging data was available for 1276 subjects typed for each of rs12102024 and rs1051288, and 2621 individuals typed for rs4778298 (S3 Table). We saw no signal at rs12102024 or rs1051288, but did see an effect at rs4778298, the top hit in our discovery cohort (p nominal = 3.2x10 -2 ; Fig 3). As above, the effect became stronger after restricting the analysis to Caucasians and survived correction for multiple comparisons (p nominal = 1.3x10 -2 ; p bonferroni corrected = 4.9x10 -2 ; Fig 3). A linear relationship between genotype and lh.SMG surface area was also observed as above, although in contrast to the discovery cohort the 'A' allele was seen to be associated with a relative reduction.

Common variant signal associated with CYFIP1 levels in human brain
Inspection of the genomic region surrounding rs4778298 determined that it was 2.2 kb upstream of the transcriptional start for a CYFIP1 isoform encoding a 95 kDa protein (CYFIP1 short ; Fig 4A). To test the hypothesis that rs4778298 genotype is associated with variation in mRNA levels for CYFIP1 short , we turned to BRAINEAC, a resource integrating genotype and expression data for human brain samples [50]. Analyses identified a significant linear relationship between rs4778298 genotype and CYFIP1 short levels (Affymetrix Probe 3583676; p = 7.2x10 -5 ; S7 Table). rs4778298-C was associated with increased mRNA levels. Additional eQTL variants in high linkage disequilibrium (LD) with rs4778298 (r 2 >0.9; S7 Table) were also identified. No relationship between genotype at any of these SNPs and levels of TUBGCP5, NIPA2, or NIPA1 was observed (p>0.05). Pairwise analyses of the top signal for each of volume, surface area, and thickness from the discovery cohort determined that aggregate effects were attributable to individual ROIs (and see S5 Table). Post-hoc permutation testing, accounting for all 52 tag-SNPs and ROIs examined, found a significant association between rs4778298 genotype and the surface area of the left supramarginal gyrus (lh.SMG) in Caucasian subjects (p permutation corrected = 7.5x10 -3 ). (B) Linear effects at the CC and lh.SMG were observed for rs12102024 and rs4778298, respectively, consistent with an effect of allele dosage. rh.ACC and CC correspond to right anterior cingulate cortex and corpus callosum, respectively. Evidence for genotype-dependent activation of CYFIP1 by FOXP2 We sought next to understand how genetic variation upstream of CYFIP1 short might impact expression and perhaps brain structure. Beginning with TFs in JASPAR for which binding in human was characterized (n = 127), we identified a subset of twenty-eight whose expression in brain was correlated with CYFIP1 short (p bonferroni corrected <0.05; S8 Table). Ten of these were  predicted to bind in a genotype dependent manner to one or more of the eight CYFIP1 short eQTL variants we identified (S9 Table). After removing candidate TFs on the basis of logical inconsistencies (e.g. negatively correlated with CYFIP1 short expression but having a binding site containing the allele associated with increased expression), or where binding was unlikely to be sensitive to genotype (see methods; S9 Table), only E2F4 and FOXP2 remained. FOXP2 is particularly interesting, as it is well recognized as an important contributor to human speech and language [36][37][38]. We see a positive relationship between FOXP2 and CYFIP1 short levels (r 2 = 0.36; p = 2.5x10 -5 ; Fig 4B). Consistent with these findings, the rs66903469-T allele predicted to bind FOXP2 with 11.8-fold greater affinity than the alternate C allele (Fig 4C) is associated with higher levels of CYFIP1 short (p = 2.9x10 -5 ; Fig 4D and S9 Table).

Discussion
Towards insight into how genetic variation at the 15q11.2 BP1-2 region may influence disease risk, we examined the relationship between tag-SNPs spanning the interval and aspects of regional brain structure. We report a common variant 2 kb upstream of CYFIP1 associated with the surface area of the lh.SMG, a brain region implicated in language. A relationship between this genotype and lh.SMG surface area was also observed in an independent population composed of 2621 individuals. Consistent with a model whereby the effect is mediated by altered gene expression, in silico analyses revealed that genotype at this SNP, and at seven other variants in high LD, is associated with variation in CYFIP1 short mRNA levels in human brain. Differential expression may be mediated by FOXP2, a TF implicated in speech and language [36][37][38], as it is predicted to bind one of these eQTL variants in an allele-dependent fashion. Supportive of this idea, we report a correlation between mRNA levels for FOXP2 and CYFIP1short in human brain.
Although we see a relationship between rs4778298 genotype in two independent cohorts, the direction of the effect differs between the populations. The 'A' allele is associated with increased lh.SMG surface area in the discovery cohort, but a decrease in the validation cohort. While we cannot rule out the possibility that this result may be a false positive, the linearity of the observed effects that recapitulate those for rare CNVs at BP1-2 argue against this [4]. Moreover, as described below, the link to FOXP2 provides biological plausibility for the observed association. The difference in directionality we observed in the two cohorts we looked at may be the result of genetic interactions with variants at the BP1-2 locus or elsewhere in the genome [53,54]. Multiple CYFIP1 regulatory variants independent from rs4778298 / rs66903469 have been associated with increased risk for schizophrenia and autism [2,28,30,55]. Further underscoring complexity at the locus, in one of these studies no association with schizophrenia was observed when the CYFIP1 regulatory variant rs4778334 was considered in isolation, but a clear signal was seen when the effect of a second eQTL variant at the ACTR2 gene encoding an interacting protein was taken into account [28]. Importantly, directionality was reversed as a function of this second genotype, with rs4778334-C seen to be protective in rs268864-AA individuals but found to be risk associated in rs268864-GG subjects. The difference in directionality we observed between the two populations we looked at could also be related to other variables. For example, parent of origin effects in abnormalities of synaptic transmission and behavior in Cyfip1 +/have been reported [56]. Similarly, socioeconomic factors are known to be associated with the regional surface area of numerous brain regions including those involved in language [57]. The relatively small number of subjects within the discovery cohort may also be relevant. Although relatively little is known about CYFIP1 function in mammalian brain development, early knockdown of Cyfip1 in mouse results in inappropriate outward migration of radial glia (RG) from the ventricular zone [28]. Our findings are consistent with this result, in that separate work in human suggests that RG are important regulators of cortical surface area [58,59]. Our results also provide potential insights into mechanisms that may mediate these effects. Full length CYFIP1 binds both RAC1 and FMRP, whereas CYFIP1 short is truncated and binds only FMRP [22,25]. All else being equal, less CYFIP1 short would result in fewer FMRP binding sites and increase competition between FMRP and RAC1 for CYFIP1 long . Reduced RAC1 binding would be expected to dampen WRC activity and in turn interfere with RG migration, a hypothesis that could be tested experimentally. Such experiments, together with studies in additional cohorts, might also shed light on regional specificity. As it stands, it's unclear whether the signal we observed at rs4778298 / rs66903469 is specific for the lh.SMG or alternatively a result of insufficient power. Results from these investigations will provide new insights into the relationship between genetic variation at the BP1-2 region and brain patterning.
The novel relationship we describe between FOXP2 and CYFIP1 is likewise intriguing given involvement of FOXP2 in the production and comprehension of speech and language. A point mutation disrupting the DNA-binding domain of FOXP2 was seen to segregate with a severe speech and language disorder in the multi-generational KE family [37,60], and screening of unrelated individuals with similar problems identified additional subjects harboring FOXP2 mutations [61]. Experimental work will be required to establish that FOXP2 binding at rs66903469 is genotype-dependent and that variation here is sufficient to alter CYFIP1 short levels, efforts that may be complicated by temporal and regional specificity. Nevertheless, the previously unknown relationship between FOXP2 and CYFIP1 short provide new insight into language-related aspects of human brain development. Moreover, in keeping with our model that the lh.SMG may be sensitive to FOXP2 activity, an fMRI study determined that KE mutation carriers show hypo-activity at this region during a word retrieval task [62].
Although we did not evaluate directly the relationship between common variation at CYFIP1 and language ability in this study, our results are consistent with the hypothesis that language impairment in BP1-2 deletion carriers [4,8] may be attributable to reduced CYFIP1 acting on the lh.SMG. With regard to idiopathic disease, a meta-analysis points to a reduced volume of lh.SMG in dyslexic children prior to the onset of reading impairment [63,64]. Similarly, adults with dylexia show under-activation of the lh.SMG in a number of fMRI studies [65][66][67]. In terms of function, the lh.SMG as a part of the dorsal lexical pathway is known to be important in transforming written language components (graphemes) into phonological objects (phonemes) [68,69]. Another function of the lh.SMG is the integration of auditory and sensory inputs before forwarding signals to other brain regions relevant to speech articulation [70,71]. Based on these findings, evaluation of phonological processing and speech articulation in BP1-2 CNV carriers is warranted. Separate studies characterizing the relationships between common variation at the locus and language related outcomes in healthy individuals might complement these efforts.
Lastly, our study points to overlap between the effect of rare and common genetic variation at BP1-2 locus. Given practical difficulties assembling cohorts where all individuals harbor the same rare variant, and progress establishing large cohorts with both MRI and genetic data, the approach described here may be useful in understanding the mechanisms underlying risk for disease at other multi-gene CNVs. Haploview was used to generate a schematic representation of the interval and define LD structure. Genes within the interval are illustrated (TUBGCP5, CYFIP1, NIPA2, and NIPA1). Below this, SNPs are represented by short vertical lines. LD blocks appear as black triangles. (PDF) S1