Copy Number Variants in German Patients with Schizophrenia

Large rare copy number variants (CNVs) have been recognized as significant genetic risk factors for the development of schizophrenia (SCZ). However, due to their low frequency (1∶150 to 1∶1000) among patients, large sample sizes are needed to detect an association between specific CNVs and SCZ. So far, the majority of genome-wide CNV analyses have focused on reporting only CNVs that reached a significant P-value within the study cohort and merely confirmed the frequency of already-established risk-carrying CNVs. As a result, CNVs with a very low frequency that might be relevant for SCZ susceptibility are lost for secondary analyses. In this study, we provide a concise collection of high-quality CNVs in a large German sample consisting of 1,637 patients with SCZ or schizoaffective disorder and 1,627 controls. All individuals were genotyped on Illumina's BeadChips and putative CNVs were identified using QuantiSNP and PennCNV. Only those CNVs that were detected by both programs and spanned ≥30 consecutive SNPs were included in the data collection and downstream analyses (2,366 CNVs, 0.73 CNVs per individual). The genome-wide analysis did not reveal a specific association between a previously unknown CNV and SCZ. However, the group of CNVs previously reported to be associated with SCZ was more frequent in our patients than in the controls. The publication of our dataset will serve as a unique, easily accessible, high-quality CNV data collection for other research groups. The dataset could be useful for the identification of new disease-relevant CNVs that are currently overlooked due to their very low frequency and lack of power for their detection in individual studies.


Introduction
Schizophrenia (SCZ) is a severe and debilitating neuropsychiatric disorder with a lifetime prevalence of 0.5-1%. Based on twin studies, it was estimated that SCZ has a heritability of ,80% [1]. Several large copy number variants (CNVs) have been identified as risk factors for SCZ susceptibility [2][3][4][5][6][7][8][9][10][11][12][13][14]. Among patients with SCZ, the frequency of such CNVs is low and ranges between 1:150 and 1:1,000 [15]. The majority of published CNV studies have focused on microduplications and microdeletions that reach a nominally significant P-value, and thus these studies only report the frequency of already-established risk-bearing CNVs. Therefore, very rare CNVs that might be disease-relevant go undetected and unreported due to the limited sample sizes of current studies and are consequently lost for follow-up studies. To help change this, we set out to compile a concise collection of high-quality CNV calls in a clinically well-characterized sample of 1,637 patients with SCZ or schizoaffective disorder (SCZA). Additionally, using data from 1,627 controls we (i) performed a genomewide CNV analysis, and (ii) checked the frequency of CNVs previously reported to be associated with schizophrenia.

Ethics statement
Each participant provided written informed consent prior to inclusion and all aspects of the study complied with the Declaration of Helsinki. The study was approved by the ethics committees of all study centers: Ethics Committee of the Rheinische Friedrich-Wilhelms-University Medical School in Bonn, Ethics Committee ''Medizinische Ethik-Kommission II'' University of Heidelberg, Ethics Committee of the Friedrich-Schiller-University Medical School in Jena, and Ethics Committee of the Ludwig-Maximilians-University Munich.
In our study, we only included patients who had full capacity to consent. The ability to consent was determined by the referring psychiatrist. We did not enclose patients who were under care or had a legal guardian. In exceptional cases, a patient with a legal guardian wished to participate and we did not want to discriminate against him/her by denying participation to the study. In these rare cases, written informed consent was obtained by both the patient and the legal guardian. At the recruiting site in Jena, patients with a legal guardian, who wanted to be included in the study, had to have full capacity to consent and no legal obligation to obtain additional permission by their legal guardian. All potential participants who declined to participate (for any reason) were eligible for treatment and were not disadvantaged in any other way by not participating in the study.

Sample description
All individuals were of German descent according to selfreported ancestry. A total of 1,831 patients were recruited from consecutive admissions to psychiatric inpatient units. A lifetime ''best estimate'' diagnosis [16] of SCZ or SCZA according to DSM-IV criteria [17] was assigned on the basis of the Structured Clinical Interview [18] or the OPCRIT [19], medical records, and family history. A total of 1,643 controls were included. These are described in detail elsewhere [20].

Genotyping and quality control
Venous blood samples were obtained from all participants. These were genotyped separately using the following Illumina BeadChips: HumanHap550v3, Human610-Quadv1, and Hu-man660W-Quad. Only those markers common to all three chips were analyzed. In total we analyzed 546,137 markers. To avoid technical artifacts in CNV calling, stringent quality control criteria were applied prior to computational CNV prediction, as described in Degenhardt et al. [20].

CNV detection and CNV quality control criteria
Detailed information on CNV detection and CNV quality control is provided elsewhere [20]. In brief, to identify potential CNVs, the BeadChip data of each participant was analyzed with QuantiSNP (version 2.1, http://www.well.ox.ac.uk/QuantiSNP; [21]) and PennCNV (version 2010May01, http://www. openbioinformatics.org/penncnv/; [22]). Individuals were excluded if their standard deviation from the log R ratio calculated over all SNPs exceeded 0.30. CNVs were required to have a minimum of 30 consecutive SNPs and a log Bayes Factor (lBF; QuantiSNP) or confidence value (PennCNV) of at least 30.

Merging PennCNV and QuantiSNP CNV data
Only those CNVs that passed our filter criteria and that were called by both QuantiSNP and PennCNV were included in our analysis. All CNVs that are listed in the supplement were visually inspected and confirmed using Illumina's GenomeStudio Geno-  S1). In cases where the two programs identified differing CNV breakpoints, only the overlapping CNV was reported. Rarely, one program detected one large CNV whereas the other program identified two smaller CNVs. Based on the visual inspection of the CNV and the SNP coverage in the affected CNV region, the predicted CNV that was more likely to be genuine was selected. All CNVs that passed the stringent filter criteria were presumed to be genuine CNVs. In two previous studies, 100% of our putative CNVs were shown to be technically verifiable when our filter criteria were employed [20,23].

Statistical analysis of CNVs
The datasets generated by QuantiSNP and PennCNV were analyzed using PLINK (version 1.07, http://pngu.mgh.harvard. edu/,purcell/plink/index.shtml [24]). To test for associations between SCZ and CNVs in specific chromosomal regions, Fisher's exact test was applied to calculate P-values and odds ratios.

CNVs in regions previously associated with SCZ
The presence in our dataset of CNVs previously shown to be associated with SCZ in at least two independent studies was examined. To be included in our analysis, the CNVs derived from our dataset had to overlap $80% with the reported CNVs. CNVs in 16p13.11 had to overlap $80% with interval II, as described by Ingason et al. [10]. For CNVs in 2p16.3, we considered all CNVs that affected the gene neurexin-1 and spanned $10 consecutive SNPs. We allowed for the inclusion of smaller deletions and duplications because smaller CNVs in NRXN1 were previously reported to be associated with schizophrenia [5]. All CNVs affected NRXN1 directly and only five out of 11 patients had a CNV smaller than 30 SNPs. Each CNV was visually confirmed in the GenomeStudio.

Data collection of high-quality CNVs
After application of all filter criteria, data from 1,637 patients and 1,627 controls remained. Using QuantiSNP, 2,487 CNVs passed our set of filter criteria (0.76 CNVs per individual); for PennCNV, 2,430 CNVs passed all filters (0.74 CNVs per individual). After merging the two datasets and removing those CNVs that were detected by only one program, 2,366 CNVs (0.73 CNVs per individual) remained in our dataset. All CNVs derived from our patients are provided in table S1. Individual genotype and intensity data can be obtained on a collaborative basis.

Genome-wide association analysis
We performed a genome-wide analysis to uncover associations between specific CNVs and SCZ, but we did not detect any specific CNVs to be nominally significantly associated with the disorder. However, it is noteworthy that, in our dataset, a CNV had to be identified in at least six patients and in no control to reach a nominally significant P-value.

CNVs in regions previously associated with SCZ
In total, we analyzed 11 CNV regions (CNVRs) that are established risk factors for SCZ susceptibility and detected a higher proportion of CNVs overlapping $80% with those regions in patients than in the controls. We identified 20 microdeletions and five microduplications in 1,637 patients as compared with 12 microdeletions in 1,627 controls (see Table 1, P-value: 0.051). No duplication in any of the 11 CNVRs was detected among the controls. In our dataset, none of the specific CNVs was individually significantly associated with SCZ. None of our patients carried a CNV in 3q29, 15q13.3, 17p12, or 17q12. However, in 1q21.1, 2p16.3, and 22q11.2, we identified a higher proportion of microdeletions in patients as compared with controls (3/1,637 in patients versus 1/1,627 in controls in 1q21.1; 10/ 1,637 in patients versus 5/1,627 in controls in 2p16.3, and 2/ 1,637 in patients versus 0/1,627 in controls in 22q11.2). We detected a microduplication in 7q36.3 and 16p11.2 in the patients, whereas neither of these microduplications was detected in the controls. In 15q11.2 we identified a higher proportion of microdeletions in controls as compared with patients (2/1,637 in patients versus 3/1,627 in controls. Phenotypic information for the CNV carriers is provided in Table 2.

Discussion
In this study, we provide a unique, high-quality CNV data collection in a clinically well characterized sample of 1,637 patients with SCZ and SCZA. Applying conservative, stringent filter criteria is expected to reduce type I errors, whereas type II errors are expected to increase. This strategy benefitted the genome-wide CNV dataset for two main reasons: (i) larger CNVs were selected for and these were more likely to have a stronger impact on the phenotype than smaller CNVs, and (ii) only CNVs that spanned a fairly large number of consecutive SNPs and that were detected by two CNV programs were selected, which increased the likelihood of obtaining genuine CNVs. In our previous experience, all microduplications and microdeletions fulfilling our quality criteria can be successfully verified by quantitative real-time PCR using TaqMan Copy Number Assays (Applied Biosystem, Foster City, CA, USA) or Fast SYBRH Green (Life Technologies, Carlsbad, California, USA), or pre-designed SALSAH MLPAH (Multiplex Ligation-dependent Probe Amplification) kits (MRC Holland, Amsterdam, the Netherlands) [20,23]. This high CNV reliability is important as we hope the dataset will serve as a resource for other investigators who might wish to combine their own data with our data with the aim of achieving a higher detection rate for associations between individual CNVs and this disease.
We checked our dataset for the presence of CNVs in regions previously associated with SCZ. Even though none of the CNVs in the 11 analyzed CNVRs reached a significant P-value, we found a higher frequency of CNVs in 1q21.1, 2p16.3, 7q36.3, 16p13.11, 16p11.2, and 22q11.2 in patients than in the controls. Therefore, our study provides additional support that CNVs in these regions are genetic risk factors for SCZ. None of our patients carried a CNV in 3q29, 15q13.3, 17p12, or 17q12. The frequency of SCZassociated CNVs is estimated to be low (e.g. ,1% for CNVs in 3q29 and 17q12; [11,13]). Therefore, one likely explanation for a lack of CNVs in these regions is that our dataset, despite its reasonable size, might still not be sufficiently powered to detect very rare CNVs. However, our study confirms the low frequency of risk-carrying CNVs in controls.
The main limitation of the present study is, at the same time, its greatest strength. Applying stringent filter criteria could have led to the removal of smaller, and potentially true, CNVs from our dataset. Conversely, this approach ensured that the reported CNVs were of high quality.